2025-12-04T10:32:01.2978283Z Current runner version: '2.329.0'
2025-12-04T10:32:01.2981220Z Runner name: 'linux.rocm.gpu.gfx942.4.b-bphpw-runner-5l4hk'
2025-12-04T10:32:01.2981642Z Runner group name: 'default'
2025-12-04T10:32:01.2982038Z Machine name: 'linux'
2025-12-04T10:32:01.2983173Z ##[group]GITHUB_TOKEN Permissions
2025-12-04T10:32:01.2984193Z Contents: read
2025-12-04T10:32:01.2984656Z Metadata: read
2025-12-04T10:32:01.2984878Z ##[endgroup]
2025-12-04T10:32:01.2985890Z Secret source: Actions
2025-12-04T10:32:01.2986199Z Prepare workflow directory
2025-12-04T10:32:01.3220522Z Prepare all required actions
2025-12-04T10:32:01.3240193Z Getting action download info
2025-12-04T10:32:01.8070760Z Download action repository 'pytorch/pytorch@main' (SHA:c0cb6e78404416d418350632bfc554710a5f7281)
2025-12-04T10:32:06.5433626Z Download action repository 'pytorch/test-infra@main' (SHA:39aa74d619174326f4e2fb0e216151c2f29d9ffd)
2025-12-04T10:32:07.8911741Z Download action repository 'actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02' (SHA:ea165f8d65b6e75b540449e92b4886f43607fa02)
2025-12-04T10:32:09.0280876Z Download action repository 'aws-actions/configure-aws-credentials@ececac1a45f3b08a01d2dd070d28d111c5fe6722' (SHA:ececac1a45f3b08a01d2dd070d28d111c5fe6722)
2025-12-04T10:32:10.0993856Z Getting action download info
2025-12-04T10:32:10.2806092Z Download action repository 'actions/checkout@v4' (SHA:34e114876b0b11c390a56381ad16ebd13914f8d5)
2025-12-04T10:32:11.2750728Z Getting action download info
2025-12-04T10:32:11.5345685Z Download action repository 'nick-fields/retry@v3.0.0' (SHA:7152eba30c6575329ac0576536151aca5a72780e)
2025-12-04T10:32:12.4470373Z Getting action download info
2025-12-04T10:32:12.6435277Z Uses: pytorch/pytorch/.github/workflows/_rocm-test.yml@refs/heads/main (ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32)
2025-12-04T10:32:12.6437421Z ##[group] Inputs
2025-12-04T10:32:12.6437570Z   build-environment: linux-jammy-rocm-py3.10
2025-12-04T10:32:12.6440943Z   test-matrix: {"include": [{"config": "default", "shard": 1, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "default", "shard": 1, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "default", "shard": 2, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "default", "shard": 2, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "default", "shard": 3, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "default", "shard": 3, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "default", "shard": 4, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "default", "shard": 4, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "default", "shard": 5, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "default", "shard": 5, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "default", "shard": 6, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "default", "shard": 6, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}]}
2025-12-04T10:32:12.6444321Z   docker-image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-rocm-n-py3-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T10:32:12.6444611Z   sync-tag: 
2025-12-04T10:32:12.6445143Z   timeout-minutes: 300
2025-12-04T10:32:12.6445278Z   tests-to-include: 
2025-12-04T10:32:12.6445393Z   dashboard-tag: 
2025-12-04T10:32:12.6445634Z   disable-monitor: true
2025-12-04T10:32:12.6445751Z   monitor-log-interval: 5
2025-12-04T10:32:12.6445873Z   monitor-data-collect-interval: 1
2025-12-04T10:32:12.6446005Z ##[endgroup]
2025-12-04T10:32:12.6446220Z Complete job name: linux-jammy-rocm-py3.10 / test (distributed, 2, 3, linux.rocm.gpu.gfx942.4.b, mem_leak_check, unstable)
2025-12-04T10:32:12.6732161Z ##[group]Run pytorch/pytorch/.github/actions/checkout-pytorch@main
2025-12-04T10:32:12.6732443Z with:
2025-12-04T10:32:12.6732544Z   no-sudo: true
2025-12-04T10:32:12.6732642Z   submodules: recursive
2025-12-04T10:32:12.6732746Z   fetch-depth: 0
2025-12-04T10:32:12.6732887Z env:
2025-12-04T10:32:12.6732981Z   GIT_DEFAULT_BRANCH: main
2025-12-04T10:32:12.6733102Z ##[endgroup]
2025-12-04T10:32:12.6777718Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT"
2025-12-04T10:32:12.6778107Z [36;1mecho "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT"[0m
2025-12-04T10:32:12.6784950Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T10:32:12.6785108Z env:
2025-12-04T10:32:12.6785208Z   GIT_DEFAULT_BRANCH: main
2025-12-04T10:32:12.6785315Z ##[endgroup]
2025-12-04T10:32:12.6945808Z ##[group]Run actions/checkout@v4
2025-12-04T10:32:12.6946006Z with:
2025-12-04T10:32:12.6946132Z   ref: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T10:32:12.6946271Z   fetch-depth: 0
2025-12-04T10:32:12.6946369Z   submodules: recursive
2025-12-04T10:32:12.6946568Z   show-progress: false
2025-12-04T10:32:12.6946683Z   repository: pytorch/pytorch
2025-12-04T10:32:12.6947035Z   token: ***
2025-12-04T10:32:12.6947131Z   ssh-strict: true
2025-12-04T10:32:12.6947217Z   ssh-user: git
2025-12-04T10:32:12.6947316Z   persist-credentials: true
2025-12-04T10:32:12.6947422Z   clean: true
2025-12-04T10:32:12.6947534Z   sparse-checkout-cone-mode: true
2025-12-04T10:32:12.6947657Z   fetch-tags: false
2025-12-04T10:32:12.6947753Z   lfs: false
2025-12-04T10:32:12.6947838Z   set-safe-directory: true
2025-12-04T10:32:12.6947943Z env:
2025-12-04T10:32:12.6948028Z   GIT_DEFAULT_BRANCH: main
2025-12-04T10:32:12.6948129Z ##[endgroup]
2025-12-04T10:32:12.7486151Z Syncing repository: pytorch/pytorch
2025-12-04T10:32:12.7486731Z ##[group]Getting Git version info
2025-12-04T10:32:12.7486913Z Working directory is '/home/runner/_work/pytorch/pytorch'
2025-12-04T10:32:12.7487175Z [command]/usr/bin/git version
2025-12-04T10:32:12.7487292Z git version 2.52.0
2025-12-04T10:32:12.7499607Z ##[endgroup]
2025-12-04T10:32:12.7505087Z Copying '/home/runner/.gitconfig' to '/home/runner/_work/_temp/37b56e8c-a39e-4c9e-9610-87829546a25e/.gitconfig'
2025-12-04T10:32:12.7511050Z Temporarily overriding HOME='/home/runner/_work/_temp/37b56e8c-a39e-4c9e-9610-87829546a25e' before making global git config changes
2025-12-04T10:32:12.7511563Z Adding repository directory to the temporary git global config as a safe directory
2025-12-04T10:32:12.7514004Z [command]/usr/bin/git config --global --add safe.directory /home/runner/_work/pytorch/pytorch
2025-12-04T10:32:12.7541354Z [command]/usr/bin/git config --local --get remote.origin.url
2025-12-04T10:32:12.7555671Z https://github.com/pytorch/pytorch
2025-12-04T10:32:12.7574123Z ##[group]Removing previously created refs, to avoid conflicts
2025-12-04T10:32:12.7577400Z [command]/usr/bin/git rev-parse --symbolic-full-name --verify --quiet HEAD
2025-12-04T10:32:12.7597971Z refs/heads/main
2025-12-04T10:32:12.7608885Z [command]/usr/bin/git checkout --detach
2025-12-04T10:32:14.3195057Z HEAD is now at c0cb6e784044 [DTensor] ExplicitRedistributionContext warning mode (#169452)
2025-12-04T10:32:14.3232061Z [command]/usr/bin/git branch --delete --force main
2025-12-04T10:32:14.3372023Z Deleted branch main (was c0cb6e784044).
2025-12-04T10:32:14.3377371Z ##[endgroup]
2025-12-04T10:32:14.3379760Z [command]/usr/bin/git submodule status
2025-12-04T10:32:14.3568020Z  7e1e1fe3858c63c251c637ae41a20de425dde96f android/libs/fbjni (v0.1.0-12-g7e1e1fe)
2025-12-04T10:32:14.3625193Z  4dfe081cf6bcd15db339cf2680b9281b8451eeb3 third_party/FP16 (4dfe081)
2025-12-04T10:32:14.3665823Z  b408327ac2a15ec3e43352421954f5b1967701d1 third_party/FXdiv (b408327)
2025-12-04T10:32:14.3718134Z  c07e3a0400713d546e0dea2d5466dd22ea389c73 third_party/NNPACK (c07e3a0)
2025-12-04T10:32:14.3753026Z  3ebbc93ded7285963bff932c678fa367eb393ba6 third_party/NVTX (v3.1.0-313-g3ebbc93)
2025-12-04T10:32:14.3811143Z  1d8f600fd424278486eade7ed3e877c99f0846b1 third_party/VulkanMemoryAllocator (v2.1.0-982-g1d8f600)
2025-12-04T10:32:14.4128247Z  51a0103656eff6fc9bfd39a4597923c4b542c883 third_party/XNNPACK (remotes/origin/ds/ndk-1243-g51a0103656)
2025-12-04T10:32:14.4151859Z  01aae101b9e5e94d6c16a9514c9fb8df99c93150 third_party/aiter (v0.1.1-92-g01aae101)
2025-12-04T10:32:14.4171862Z  299e5928955cc62af9968370293b916f5130916f third_party/benchmark (v1.9.3)
2025-12-04T10:32:14.4228877Z  7fe50dc3da2069d6645d9deb8c017a876472a977 third_party/composable_kernel (rocm-6.4.3-459-g7fe50dc3d)
2025-12-04T10:32:14.4304230Z  89c932f313c6437c38f2982869beacc89c2f2246 third_party/cpp-httplib (v0.26.0)
2025-12-04T10:32:14.4374944Z  f858c30bcb16f8effd5ff46996f0514539e17abc third_party/cpuinfo (f858c30)
2025-12-04T10:32:14.4412261Z  0b1577c8c83401237d601d0d0db5210506705396 third_party/cudnn_frontend (v0.5-61-g0b1577c)
2025-12-04T10:32:14.4477143Z  f88806b1e31dfa579842638740216dd41fc6c588 third_party/cutlass (v4.3.1)
2025-12-04T10:32:14.4501286Z  c0b988d39a9e47c794d699f29930ed4d7c7e13a4 third_party/fbgemm (v1.4.0-rc1-2-gc0b988d39)
2025-12-04T10:32:14.4560223Z  979702c87a8713a8e0a5e9fee122b90d2ef13be5 third_party/flash-attention (v2.7.4)
2025-12-04T10:32:14.4573024Z  a2cd1ea3b6d3fee220106b5fed3f7ce8da9eb757 third_party/flatbuffers (v24.12.23)
2025-12-04T10:32:14.4806232Z  407c905e45ad75fc29bf0f9bb7c5c2fd3475976f third_party/fmt (12.1.0)
2025-12-04T10:32:14.4876362Z  3fb5c176c17c765a3492cd2f0321b0dab712f350 third_party/gemmlowp/gemmlowp (remotes/origin/revert-87-master-135-g3fb5c17)
2025-12-04T10:32:14.4947915Z  54cbae0d3a67fa890b4c3d9ee162b7860315e341 third_party/gloo (remotes/origin/gh/c-p-i-o/1/base-37-g54cbae0)
2025-12-04T10:32:14.5087923Z  52eb8108c5bdec04579160ae17225d66034bd723 third_party/googletest (release-1.8.0-3544-g52eb8108)
2025-12-04T10:32:14.5161753Z  719d8e6cd7f7a0e01b155657526d693acf97c2b3 third_party/ideep (pytorch-rls-v3.7.1)
2025-12-04T10:32:14.5211342Z  dec1d23ca65ab069d225dfe40dea14f455170959 third_party/ittapi (v3.25.5)
2025-12-04T10:32:14.5326063Z  31f85df8fbd89c188f14ef10f1ec65379786b943 third_party/kineto (heads/main)
2025-12-04T10:32:14.5348217Z  d7770c89632329a9914ef1a90289917597639cbe third_party/kleidiai (v1.15.0)
2025-12-04T10:32:14.5364862Z  fbd8b99c2b828428947d70fdc046bb55609be93e third_party/mimalloc (v2.2.4)
2025-12-04T10:32:14.5382332Z  55f93686c01528224f448c19128836e7df245f72 third_party/nlohmann (v3.12.0)
2025-12-04T10:32:14.5590181Z  e709452ef2bbc1d113faf678c24e6d3467696e83 third_party/onnx (v1.18.0)
2025-12-04T10:32:14.5608375Z  a799f4aed9c94b765dcdaabaeab7d5e7e2310878 third_party/opentelemetry-cpp (v1.14.2)
2025-12-04T10:32:14.5626940Z  0fa0ef591e38c2758e3184c6c23e497b9f732ffa third_party/pocketfft (release_for_eigen-40-g0fa0ef5)
2025-12-04T10:32:14.5842344Z  d1eca4e4b421cd2997495c4b4e65cea6be4e9b8a third_party/protobuf (v3.7.0-rc.2-1279-gd1eca4e4b)
2025-12-04T10:32:14.5889531Z  072586a71b55b7f8c584153d223e95687148a900 third_party/psimd (heads/master)
2025-12-04T10:32:14.5926721Z  4fe0e1e183925bf8cfa6aae24237e724a96479b8 third_party/pthreadpool (0.1-144-g4fe0e1e)
2025-12-04T10:32:14.5943421Z  f5fbe867d2d26e4a0a9177a51f6e568868ad3dc8 third_party/pybind11 (v3.0.1)
2025-12-04T10:32:14.5987032Z  f45429b087dd7d5bc78bb40dc7cf06425c252d67 third_party/python-peachpy (remotes/origin/pre-generated)
2025-12-04T10:32:14.6040617Z  5a1d179df9cf652951b59010a2d2075372d67f68 third_party/sleef (3.8)
2025-12-04T10:32:14.6088447Z  2b4cd91092d335a697416b2a3cb398283246849d third_party/tensorpipe (heads/main)
2025-12-04T10:32:14.6097247Z ##[group]Cleaning the repository
2025-12-04T10:32:14.6101637Z [command]/usr/bin/git clean -ffdx
2025-12-04T10:32:14.6228803Z [command]/usr/bin/git reset --hard HEAD
2025-12-04T10:32:14.6968273Z HEAD is now at c0cb6e784044 [DTensor] ExplicitRedistributionContext warning mode (#169452)
2025-12-04T10:32:14.7024884Z ##[endgroup]
2025-12-04T10:32:14.7027603Z ##[group]Disabling automatic garbage collection
2025-12-04T10:32:14.7031613Z [command]/usr/bin/git config --local gc.auto 0
2025-12-04T10:32:14.7064245Z ##[endgroup]
2025-12-04T10:32:14.7064434Z ##[group]Setting up auth
2025-12-04T10:32:14.7068113Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand
2025-12-04T10:32:14.7085348Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :"
2025-12-04T10:32:14.7275946Z Entering 'android/libs/fbjni'
2025-12-04T10:32:14.7305622Z Entering 'third_party/FP16'
2025-12-04T10:32:14.7330167Z Entering 'third_party/FXdiv'
2025-12-04T10:32:14.7366451Z Entering 'third_party/NNPACK'
2025-12-04T10:32:14.7403701Z Entering 'third_party/NVTX'
2025-12-04T10:32:14.7439845Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T10:32:14.7465180Z Entering 'third_party/XNNPACK'
2025-12-04T10:32:14.7494426Z Entering 'third_party/aiter'
2025-12-04T10:32:14.7524502Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T10:32:14.7567008Z Entering 'third_party/benchmark'
2025-12-04T10:32:14.7592632Z Entering 'third_party/composable_kernel'
2025-12-04T10:32:14.7628012Z Entering 'third_party/cpp-httplib'
2025-12-04T10:32:14.7671122Z Entering 'third_party/cpuinfo'
2025-12-04T10:32:14.7699534Z Entering 'third_party/cudnn_frontend'
2025-12-04T10:32:14.7736205Z Entering 'third_party/cutlass'
2025-12-04T10:32:14.7764287Z Entering 'third_party/fbgemm'
2025-12-04T10:32:14.7790497Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T10:32:14.7822009Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T10:32:14.7854040Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T10:32:14.7879299Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T10:32:14.7914919Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T10:32:14.7943213Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T10:32:14.7972768Z Entering 'third_party/fbgemm/external/json'
2025-12-04T10:32:14.8008111Z Entering 'third_party/flash-attention'
2025-12-04T10:32:14.8033309Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T10:32:14.8067590Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T10:32:14.8099382Z Entering 'third_party/flatbuffers'
2025-12-04T10:32:14.8124055Z Entering 'third_party/fmt'
2025-12-04T10:32:14.8156233Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T10:32:14.8194803Z Entering 'third_party/gloo'
2025-12-04T10:32:14.8220774Z Entering 'third_party/googletest'
2025-12-04T10:32:14.8243438Z Entering 'third_party/ideep'
2025-12-04T10:32:14.8266002Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T10:32:14.8304465Z Entering 'third_party/ittapi'
2025-12-04T10:32:14.8327358Z Entering 'third_party/kineto'
2025-12-04T10:32:14.8351899Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T10:32:14.8376351Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T10:32:14.8399190Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T10:32:14.8426428Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T10:32:14.8454114Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T10:32:14.8480082Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T10:32:14.8500600Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T10:32:14.8522215Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T10:32:14.8547447Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T10:32:14.8569223Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T10:32:14.8597584Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T10:32:14.8636962Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T10:32:14.8665380Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T10:32:14.8691910Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T10:32:14.8713844Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T10:32:14.8737562Z Entering 'third_party/kleidiai'
2025-12-04T10:32:14.8760284Z Entering 'third_party/mimalloc'
2025-12-04T10:32:14.8788343Z Entering 'third_party/nlohmann'
2025-12-04T10:32:14.8813342Z Entering 'third_party/onnx'
2025-12-04T10:32:14.8857716Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T10:32:14.8898226Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T10:32:14.8928902Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T10:32:14.8960677Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T10:32:14.8990894Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T10:32:14.9016803Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T10:32:14.9039357Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T10:32:14.9058874Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T10:32:14.9077401Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T10:32:14.9098414Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T10:32:14.9122909Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T10:32:14.9154243Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T10:32:14.9192527Z Entering 'third_party/pocketfft'
2025-12-04T10:32:14.9221986Z Entering 'third_party/protobuf'
2025-12-04T10:32:14.9253070Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T10:32:14.9284947Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T10:32:14.9319047Z Entering 'third_party/psimd'
2025-12-04T10:32:14.9345901Z Entering 'third_party/pthreadpool'
2025-12-04T10:32:14.9376437Z Entering 'third_party/pybind11'
2025-12-04T10:32:14.9398091Z Entering 'third_party/python-peachpy'
2025-12-04T10:32:14.9422673Z Entering 'third_party/sleef'
2025-12-04T10:32:14.9452552Z Entering 'third_party/tensorpipe'
2025-12-04T10:32:14.9476953Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T10:32:14.9498545Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T10:32:14.9520298Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T10:32:14.9541144Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T10:32:14.9564904Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T10:32:14.9603588Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader
2025-12-04T10:32:14.9621915Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :"
2025-12-04T10:32:14.9786471Z Entering 'android/libs/fbjni'
2025-12-04T10:32:14.9810603Z Entering 'third_party/FP16'
2025-12-04T10:32:14.9835925Z Entering 'third_party/FXdiv'
2025-12-04T10:32:14.9857612Z Entering 'third_party/NNPACK'
2025-12-04T10:32:14.9880292Z Entering 'third_party/NVTX'
2025-12-04T10:32:14.9902201Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T10:32:14.9930229Z Entering 'third_party/XNNPACK'
2025-12-04T10:32:14.9958038Z Entering 'third_party/aiter'
2025-12-04T10:32:14.9981157Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T10:32:15.0006873Z Entering 'third_party/benchmark'
2025-12-04T10:32:15.0029028Z Entering 'third_party/composable_kernel'
2025-12-04T10:32:15.0054434Z Entering 'third_party/cpp-httplib'
2025-12-04T10:32:15.0083694Z Entering 'third_party/cpuinfo'
2025-12-04T10:32:15.0107793Z Entering 'third_party/cudnn_frontend'
2025-12-04T10:32:15.0128719Z Entering 'third_party/cutlass'
2025-12-04T10:32:15.0153452Z Entering 'third_party/fbgemm'
2025-12-04T10:32:15.0175813Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T10:32:15.0205468Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T10:32:15.0238479Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T10:32:15.0268293Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T10:32:15.0294480Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T10:32:15.0320956Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T10:32:15.0349838Z Entering 'third_party/fbgemm/external/json'
2025-12-04T10:32:15.0372895Z Entering 'third_party/flash-attention'
2025-12-04T10:32:15.0396246Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T10:32:15.0424637Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T10:32:15.0461401Z Entering 'third_party/flatbuffers'
2025-12-04T10:32:15.0490468Z Entering 'third_party/fmt'
2025-12-04T10:32:15.0517558Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T10:32:15.0546494Z Entering 'third_party/gloo'
2025-12-04T10:32:15.0568561Z Entering 'third_party/googletest'
2025-12-04T10:32:15.0592182Z Entering 'third_party/ideep'
2025-12-04T10:32:15.0621121Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T10:32:15.0648106Z Entering 'third_party/ittapi'
2025-12-04T10:32:15.0671788Z Entering 'third_party/kineto'
2025-12-04T10:32:15.0695837Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T10:32:15.0721879Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T10:32:15.0754397Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T10:32:15.0778224Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T10:32:15.0801949Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T10:32:15.0834526Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T10:32:15.0858848Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T10:32:15.0893409Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T10:32:15.0920534Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T10:32:15.0950741Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T10:32:15.0977114Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T10:32:15.1009409Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T10:32:15.1040635Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T10:32:15.1071220Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T10:32:15.1093727Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T10:32:15.1122666Z Entering 'third_party/kleidiai'
2025-12-04T10:32:15.1144517Z Entering 'third_party/mimalloc'
2025-12-04T10:32:15.1166056Z Entering 'third_party/nlohmann'
2025-12-04T10:32:15.1191659Z Entering 'third_party/onnx'
2025-12-04T10:32:15.1225841Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T10:32:15.1257264Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T10:32:15.1294464Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T10:32:15.1324314Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T10:32:15.1361276Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T10:32:15.1392562Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T10:32:15.1419253Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T10:32:15.1441029Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T10:32:15.1467173Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T10:32:15.1489642Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T10:32:15.1512054Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T10:32:15.1534630Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T10:32:15.1562559Z Entering 'third_party/pocketfft'
2025-12-04T10:32:15.1583854Z Entering 'third_party/protobuf'
2025-12-04T10:32:15.1605545Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T10:32:15.1626940Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T10:32:15.1652393Z Entering 'third_party/psimd'
2025-12-04T10:32:15.1677102Z Entering 'third_party/pthreadpool'
2025-12-04T10:32:15.1699091Z Entering 'third_party/pybind11'
2025-12-04T10:32:15.1720659Z Entering 'third_party/python-peachpy'
2025-12-04T10:32:15.1741176Z Entering 'third_party/sleef'
2025-12-04T10:32:15.1762474Z Entering 'third_party/tensorpipe'
2025-12-04T10:32:15.1783149Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T10:32:15.1805914Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T10:32:15.1840438Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T10:32:15.1861975Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T10:32:15.1889607Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T10:32:15.1930939Z [command]/usr/bin/git config --local --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.1956179Z [command]/usr/bin/git submodule foreach --recursive git config --local --show-origin --name-only --get-regexp remote.origin.url
2025-12-04T10:32:15.2122397Z Entering 'android/libs/fbjni'
2025-12-04T10:32:15.2138803Z file:/home/runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config	remote.origin.url
2025-12-04T10:32:15.2147263Z Entering 'third_party/FP16'
2025-12-04T10:32:15.2162383Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config	remote.origin.url
2025-12-04T10:32:15.2171578Z Entering 'third_party/FXdiv'
2025-12-04T10:32:15.2183524Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config	remote.origin.url
2025-12-04T10:32:15.2191628Z Entering 'third_party/NNPACK'
2025-12-04T10:32:15.2206174Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config	remote.origin.url
2025-12-04T10:32:15.2215094Z Entering 'third_party/NVTX'
2025-12-04T10:32:15.2226813Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config	remote.origin.url
2025-12-04T10:32:15.2236306Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T10:32:15.2248068Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config	remote.origin.url
2025-12-04T10:32:15.2255947Z Entering 'third_party/XNNPACK'
2025-12-04T10:32:15.2266211Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config	remote.origin.url
2025-12-04T10:32:15.2281353Z Entering 'third_party/aiter'
2025-12-04T10:32:15.2291715Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config	remote.origin.url
2025-12-04T10:32:15.2301464Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T10:32:15.2314944Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config	remote.origin.url
2025-12-04T10:32:15.2334793Z Entering 'third_party/benchmark'
2025-12-04T10:32:15.2347493Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config	remote.origin.url
2025-12-04T10:32:15.2355942Z Entering 'third_party/composable_kernel'
2025-12-04T10:32:15.2365822Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config	remote.origin.url
2025-12-04T10:32:15.2377769Z Entering 'third_party/cpp-httplib'
2025-12-04T10:32:15.2390927Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config	remote.origin.url
2025-12-04T10:32:15.2400795Z Entering 'third_party/cpuinfo'
2025-12-04T10:32:15.2410753Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config	remote.origin.url
2025-12-04T10:32:15.2420981Z Entering 'third_party/cudnn_frontend'
2025-12-04T10:32:15.2439688Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config	remote.origin.url
2025-12-04T10:32:15.2454157Z Entering 'third_party/cutlass'
2025-12-04T10:32:15.2466835Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config	remote.origin.url
2025-12-04T10:32:15.2481381Z Entering 'third_party/fbgemm'
2025-12-04T10:32:15.2499632Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config	remote.origin.url
2025-12-04T10:32:15.2512325Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T10:32:15.2525763Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config	remote.origin.url
2025-12-04T10:32:15.2534507Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T10:32:15.2548950Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config	remote.origin.url
2025-12-04T10:32:15.2561186Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T10:32:15.2579514Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config	remote.origin.url
2025-12-04T10:32:15.2590850Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T10:32:15.2608468Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config	remote.origin.url
2025-12-04T10:32:15.2621741Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T10:32:15.2636855Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config	remote.origin.url
2025-12-04T10:32:15.2648693Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T10:32:15.2662421Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config	remote.origin.url
2025-12-04T10:32:15.2673787Z Entering 'third_party/fbgemm/external/json'
2025-12-04T10:32:15.2685254Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config	remote.origin.url
2025-12-04T10:32:15.2697784Z Entering 'third_party/flash-attention'
2025-12-04T10:32:15.2707081Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config	remote.origin.url
2025-12-04T10:32:15.2715058Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T10:32:15.2724958Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config	remote.origin.url
2025-12-04T10:32:15.2736768Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T10:32:15.2746999Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config	remote.origin.url
2025-12-04T10:32:15.2759781Z Entering 'third_party/flatbuffers'
2025-12-04T10:32:15.2773581Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config	remote.origin.url
2025-12-04T10:32:15.2790948Z Entering 'third_party/fmt'
2025-12-04T10:32:15.2804185Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config	remote.origin.url
2025-12-04T10:32:15.2817112Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T10:32:15.2827208Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config	remote.origin.url
2025-12-04T10:32:15.2837274Z Entering 'third_party/gloo'
2025-12-04T10:32:15.2852199Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config	remote.origin.url
2025-12-04T10:32:15.2863680Z Entering 'third_party/googletest'
2025-12-04T10:32:15.2877966Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config	remote.origin.url
2025-12-04T10:32:15.2890464Z Entering 'third_party/ideep'
2025-12-04T10:32:15.2902069Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config	remote.origin.url
2025-12-04T10:32:15.2910886Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T10:32:15.2923470Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config	remote.origin.url
2025-12-04T10:32:15.2938399Z Entering 'third_party/ittapi'
2025-12-04T10:32:15.2958603Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config	remote.origin.url
2025-12-04T10:32:15.2969854Z Entering 'third_party/kineto'
2025-12-04T10:32:15.2984835Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config	remote.origin.url
2025-12-04T10:32:15.2995112Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T10:32:15.3005173Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config	remote.origin.url
2025-12-04T10:32:15.3015971Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T10:32:15.3028298Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config	remote.origin.url
2025-12-04T10:32:15.3039048Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T10:32:15.3051576Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config	remote.origin.url
2025-12-04T10:32:15.3061303Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T10:32:15.3072239Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config	remote.origin.url
2025-12-04T10:32:15.3081238Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T10:32:15.3098193Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config	remote.origin.url
2025-12-04T10:32:15.3111410Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T10:32:15.3127141Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config	remote.origin.url
2025-12-04T10:32:15.3142189Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T10:32:15.3155598Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config	remote.origin.url
2025-12-04T10:32:15.3169719Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T10:32:15.3186404Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config	remote.origin.url
2025-12-04T10:32:15.3198068Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T10:32:15.3213707Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config	remote.origin.url
2025-12-04T10:32:15.3224457Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T10:32:15.3236072Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config	remote.origin.url
2025-12-04T10:32:15.3251015Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T10:32:15.3266226Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config	remote.origin.url
2025-12-04T10:32:15.3281351Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T10:32:15.3292868Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config	remote.origin.url
2025-12-04T10:32:15.3304856Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T10:32:15.3320886Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config	remote.origin.url
2025-12-04T10:32:15.3335794Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T10:32:15.3348318Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config	remote.origin.url
2025-12-04T10:32:15.3357720Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T10:32:15.3367272Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config	remote.origin.url
2025-12-04T10:32:15.3377828Z Entering 'third_party/kleidiai'
2025-12-04T10:32:15.3392398Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config	remote.origin.url
2025-12-04T10:32:15.3402264Z Entering 'third_party/mimalloc'
2025-12-04T10:32:15.3412937Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config	remote.origin.url
2025-12-04T10:32:15.3422093Z Entering 'third_party/nlohmann'
2025-12-04T10:32:15.3433371Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config	remote.origin.url
2025-12-04T10:32:15.3444891Z Entering 'third_party/onnx'
2025-12-04T10:32:15.3454846Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config	remote.origin.url
2025-12-04T10:32:15.3471467Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T10:32:15.3481988Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config	remote.origin.url
2025-12-04T10:32:15.3500377Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T10:32:15.3510192Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config	remote.origin.url
2025-12-04T10:32:15.3519784Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T10:32:15.3532156Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config	remote.origin.url
2025-12-04T10:32:15.3542269Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T10:32:15.3553432Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config	remote.origin.url
2025-12-04T10:32:15.3563131Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T10:32:15.3582144Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config	remote.origin.url
2025-12-04T10:32:15.3592099Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T10:32:15.3606072Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config	remote.origin.url
2025-12-04T10:32:15.3621878Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T10:32:15.3645644Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config	remote.origin.url
2025-12-04T10:32:15.3655062Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T10:32:15.3668172Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config	remote.origin.url
2025-12-04T10:32:15.3678444Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T10:32:15.3690046Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config	remote.origin.url
2025-12-04T10:32:15.3702936Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T10:32:15.3712691Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config	remote.origin.url
2025-12-04T10:32:15.3722271Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T10:32:15.3738266Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config	remote.origin.url
2025-12-04T10:32:15.3753393Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T10:32:15.3763449Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config	remote.origin.url
2025-12-04T10:32:15.3780318Z Entering 'third_party/pocketfft'
2025-12-04T10:32:15.3790443Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config	remote.origin.url
2025-12-04T10:32:15.3800016Z Entering 'third_party/protobuf'
2025-12-04T10:32:15.3809342Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config	remote.origin.url
2025-12-04T10:32:15.3819099Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T10:32:15.3832658Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config	remote.origin.url
2025-12-04T10:32:15.3842618Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T10:32:15.3864011Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config	remote.origin.url
2025-12-04T10:32:15.3873331Z Entering 'third_party/psimd'
2025-12-04T10:32:15.3883807Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config	remote.origin.url
2025-12-04T10:32:15.3892961Z Entering 'third_party/pthreadpool'
2025-12-04T10:32:15.3904024Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config	remote.origin.url
2025-12-04T10:32:15.3916038Z Entering 'third_party/pybind11'
2025-12-04T10:32:15.3931880Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config	remote.origin.url
2025-12-04T10:32:15.3941475Z Entering 'third_party/python-peachpy'
2025-12-04T10:32:15.3951602Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config	remote.origin.url
2025-12-04T10:32:15.3966581Z Entering 'third_party/sleef'
2025-12-04T10:32:15.3979560Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config	remote.origin.url
2025-12-04T10:32:15.3988727Z Entering 'third_party/tensorpipe'
2025-12-04T10:32:15.4001822Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config	remote.origin.url
2025-12-04T10:32:15.4010889Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T10:32:15.4027025Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config	remote.origin.url
2025-12-04T10:32:15.4038266Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T10:32:15.4048458Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config	remote.origin.url
2025-12-04T10:32:15.4057260Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T10:32:15.4071415Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config	remote.origin.url
2025-12-04T10:32:15.4084445Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T10:32:15.4100394Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config	remote.origin.url
2025-12-04T10:32:15.4109821Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T10:32:15.4121901Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config	remote.origin.url
2025-12-04T10:32:15.4148453Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.4166444Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.4183698Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.4201508Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.4218780Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.4232032Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.4270820Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.4271409Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.4286080Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.4299665Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.4313114Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.4327864Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.4343195Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.4355223Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.4370149Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.4383387Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.4398946Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.4412461Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.4425796Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.4440174Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.4454650Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.4469966Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.4483322Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.4496114Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.4509352Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.4526382Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.4539158Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.4552295Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.4565784Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.4578364Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.4591462Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.4609524Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.4626563Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.4646184Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.4664167Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.4678289Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.4693167Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.4711319Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.4727916Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.4741468Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.4756058Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.4770551Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.4784052Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.4796351Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.4811675Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.4824999Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.4842521Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.4856467Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.4870055Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.4883961Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.4897883Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.4910965Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.4926534Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.4940225Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.4955579Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.4973802Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.4989046Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.5004190Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.5023905Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.5039840Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.5059613Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.5073656Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.5088037Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.5102285Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.5116442Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.5129505Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.5144427Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.5158732Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.5175430Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.5189212Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.5210129Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.5224464Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.5238492Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.5259118Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.5279370Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.5295714Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.5311333Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.5328765Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.5342770Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.5356592Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.5370615Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:15.5386508Z [command]/usr/bin/git config --local http.https://github.com/.extraheader AUTHORIZATION: basic ***
2025-12-04T10:32:15.5413105Z ##[endgroup]
2025-12-04T10:32:15.5413287Z ##[group]Fetching the repository
2025-12-04T10:32:15.5420555Z [command]/usr/bin/git -c protocol.version=2 fetch --prune --no-recurse-submodules origin +refs/heads/*:refs/remotes/origin/* +refs/tags/*:refs/tags/*
2025-12-04T10:32:19.3642732Z From https://github.com/pytorch/pytorch
2025-12-04T10:32:19.3643337Z  * [new branch]                2.6.0.dev20241004+      -> origin/2.6.0.dev20241004+
2025-12-04T10:32:19.3643830Z  * [new branch]                2.9.1                   -> origin/2.9.1
2025-12-04T10:32:19.3644414Z  * [new branch]                AaronWang04_addmmfusion_perftest -> origin/AaronWang04_addmmfusion_perftest
2025-12-04T10:32:19.3645032Z  * [new branch]                Flamefire-patch-1       -> origin/Flamefire-patch-1
2025-12-04T10:32:19.3645683Z  * [new branch]                HDCharles-2.6.0-release-notes -> origin/HDCharles-2.6.0-release-notes
2025-12-04T10:32:19.3646250Z  * [new branch]                HOPrintFunc             -> origin/HOPrintFunc
2025-12-04T10:32:19.3646745Z  * [new branch]                IvanKobzarev/stack/1    -> origin/IvanKobzarev/stack/1
2025-12-04T10:32:19.3647264Z  * [new branch]                NicoshevSVE128          -> origin/NicoshevSVE128
2025-12-04T10:32:19.3647773Z  * [new branch]                PR-AOTInductorNoneBug   -> origin/PR-AOTInductorNoneBug
2025-12-04T10:32:19.3648332Z  * [new branch]                PR-AOTInductorNoneBugFix -> origin/PR-AOTInductorNoneBugFix
2025-12-04T10:32:19.3648884Z  * [new branch]                PR-FixConfigsIssue      -> origin/PR-FixConfigsIssue
2025-12-04T10:32:19.3649398Z  * [new branch]                PR-NoneBugFix-viable    -> origin/PR-NoneBugFix-viable
2025-12-04T10:32:19.3649974Z  * [new branch]                PR-ResetToZero          -> origin/PR-ResetToZero
2025-12-04T10:32:19.3650987Z  * [new branch]                Update-Flash-Packaging  -> origin/Update-Flash-Packaging
2025-12-04T10:32:19.3651449Z  * [new branch]                VLA_exp                 -> origin/VLA_exp
2025-12-04T10:32:19.3651740Z  * [new branch]                activation_bench        -> origin/activation_bench
2025-12-04T10:32:19.3652023Z  * [new branch]                addmm-heuristic         -> origin/addmm-heuristic
2025-12-04T10:32:19.3652291Z  * [new branch]                adi/onednn_aarch64      -> origin/adi/onednn_aarch64
2025-12-04T10:32:19.3652562Z  * [new branch]                adi/test                -> origin/adi/test
2025-12-04T10:32:19.3652818Z  * [new branch]                adi/test_bgemm          -> origin/adi/test_bgemm
2025-12-04T10:32:19.3653073Z  * [new branch]                adi/test_m8g            -> origin/adi/test_m8g
2025-12-04T10:32:19.3653333Z  * [new branch]                adi/test_onednn         -> origin/adi/test_onednn
2025-12-04T10:32:19.3653609Z  * [new branch]                adi/test_onednn_v3.9    -> origin/adi/test_onednn_v3.9
2025-12-04T10:32:19.3653877Z  * [new branch]                adi/test_presve_change  -> origin/adi/test_presve_change
2025-12-04T10:32:19.3654147Z  * [new branch]                adi/test_timm           -> origin/adi/test_timm
2025-12-04T10:32:19.3654541Z  * [new branch]                adi/testpresve_change   -> origin/adi/testpresve_change
2025-12-04T10:32:19.3654837Z  * [new branch]                aditew01/test/vec_bf16  -> origin/aditew01/test/vec_bf16
2025-12-04T10:32:19.3655140Z  * [new branch]                ah-globalfeedback-hook  -> origin/ah-globalfeedback-hook
2025-12-04T10:32:19.3655576Z  * [new branch]                albanD-patch-1          -> origin/albanD-patch-1
2025-12-04T10:32:19.3655846Z  * [new branch]                also-surround-shimh     -> origin/also-surround-shimh
2025-12-04T10:32:19.3656126Z  * [new branch]                angelayi/aot_compile    -> origin/angelayi/aot_compile
2025-12-04T10:32:19.3656458Z  * [new branch]                angelayi/aoti_additional_files -> origin/angelayi/aoti_additional_files
2025-12-04T10:32:19.3656772Z  * [new branch]                angelayi/benchmark      -> origin/angelayi/benchmark
2025-12-04T10:32:19.3657103Z  * [new branch]                angelayi/change_pytree_serialization -> origin/angelayi/change_pytree_serialization
2025-12-04T10:32:19.3657463Z  * [new branch]                angelayi/cpp_loader     -> origin/angelayi/cpp_loader
2025-12-04T10:32:19.3657760Z  * [new branch]                angelayi/inductor_const -> origin/angelayi/inductor_const
2025-12-04T10:32:19.3658026Z  * [new branch]                angelayi/lstm           -> origin/angelayi/lstm
2025-12-04T10:32:19.3658289Z  * [new branch]                angelayi/no_so_weight   -> origin/angelayi/no_so_weight
2025-12-04T10:32:19.3658569Z  * [new branch]                angelayi/scan_layers    -> origin/angelayi/scan_layers
2025-12-04T10:32:19.3658845Z  * [new branch]                angelayi/side_eff       -> origin/angelayi/side_eff
2025-12-04T10:32:19.3659119Z  * [new branch]                angelayi/state_dict     -> origin/angelayi/state_dict
2025-12-04T10:32:19.3659395Z  * [new branch]                angelayi/symint_input   -> origin/angelayi/symint_input
2025-12-04T10:32:19.3659709Z  * [new branch]                angelayi/symm_mem       -> origin/angelayi/symm_mem
2025-12-04T10:32:19.3659977Z  * [new branch]                angelayi/test_cpp       -> origin/angelayi/test_cpp
2025-12-04T10:32:19.3660263Z  * [new branch]                angelayi/torch_size     -> origin/angelayi/torch_size
2025-12-04T10:32:19.3660526Z  * [new branch]                annotate_assert         -> origin/annotate_assert
2025-12-04T10:32:19.3660795Z  * [new branch]                annotate_fallback_kernel -> origin/annotate_fallback_kernel
2025-12-04T10:32:19.3661085Z  * [new branch]                annotation_deepcopy     -> origin/annotation_deepcopy
2025-12-04T10:32:19.3661359Z  * [new branch]                annotation_dynamo       -> origin/annotation_dynamo
2025-12-04T10:32:19.3661654Z  * [new branch]                aot_eager_stack_trace   -> origin/aot_eager_stack_trace
2025-12-04T10:32:19.3661860Z  * [new branch]                aoti-cuda-alloc         -> origin/aoti-cuda-alloc
2025-12-04T10:32:19.3662065Z  * [new branch]                aoti_const_device       -> origin/aoti_const_device
2025-12-04T10:32:19.3662273Z  * [new branch]                aoti_fqn_name_interface -> origin/aoti_fqn_name_interface
2025-12-04T10:32:19.3662519Z  * [new branch]                aoti_package_weights_binary -> origin/aoti_package_weights_binary
2025-12-04T10:32:19.3662747Z  * [new branch]                aoti_target_windows     -> origin/aoti_target_windows
2025-12-04T10:32:19.3663001Z  * [new branch]                arsh/feat/inductor_check_profiling -> origin/arsh/feat/inductor_check_profiling
2025-12-04T10:32:19.3663246Z  * [new branch]                async_tp                -> origin/async_tp
2025-12-04T10:32:19.3663488Z  * [new branch]                atalman-inductor-perf-cu124 -> origin/atalman-inductor-perf-cu124
2025-12-04T10:32:19.3663768Z  * [new branch]                atalman-inductor-perf-cu124.1 -> origin/atalman-inductor-perf-cu124.1
2025-12-04T10:32:19.3664019Z  * [new branch]                atalman-patch-2         -> origin/atalman-patch-2
2025-12-04T10:32:19.3664228Z  * [new branch]                atalman-patch-3         -> origin/atalman-patch-3
2025-12-04T10:32:19.3664467Z  * [new branch]                atalman-patch-4         -> origin/atalman-patch-4
2025-12-04T10:32:19.3664670Z  * [new branch]                atalman-patch-5         -> origin/atalman-patch-5
2025-12-04T10:32:19.3664872Z  * [new branch]                atalman-patch-6         -> origin/atalman-patch-6
2025-12-04T10:32:19.3665070Z  * [new branch]                atalman-patch-7         -> origin/atalman-patch-7
2025-12-04T10:32:19.3665275Z  * [new branch]                atalman-patch-8         -> origin/atalman-patch-8
2025-12-04T10:32:19.3665505Z  * [new branch]                atalman_inductor_2.3.1  -> origin/atalman_inductor_2.3.1
2025-12-04T10:32:19.3665725Z  * [new branch]                atalman_inductor_2.4.0  -> origin/atalman_inductor_2.4.0
2025-12-04T10:32:19.3665951Z  * [new branch]                atalman_inductor_2.4.x  -> origin/atalman_inductor_2.4.x
2025-12-04T10:32:19.3666192Z  * [new branch]                attention_benchmarking_clean -> origin/attention_benchmarking_clean
2025-12-04T10:32:19.3666441Z  * [new branch]                bahuang/dt_fix_scalar_add -> origin/bahuang/dt_fix_scalar_add
2025-12-04T10:32:19.3666675Z  * [new branch]                bahuang/fix_debug_mode  -> origin/bahuang/fix_debug_mode
2025-12-04T10:32:19.3666884Z  * [new branch]                bahuang/fix_expand      -> origin/bahuang/fix_expand
2025-12-04T10:32:19.3667090Z  * [new branch]                bahuang/test            -> origin/bahuang/test
2025-12-04T10:32:19.3667280Z  * [new branch]                base/1.5                -> origin/base/1.5
2025-12-04T10:32:19.3667515Z  * [new branch]                batching_sdpa_efficient_attention -> origin/batching_sdpa_efficient_attention
2025-12-04T10:32:19.3667765Z  * [new branch]                bench_scaled_mm_ops     -> origin/bench_scaled_mm_ops
2025-12-04T10:32:19.3667977Z  * [new branch]                benchmark-updates       -> origin/benchmark-updates
2025-12-04T10:32:19.3668198Z  * [new branch]                benchmarking-script     -> origin/benchmarking-script
2025-12-04T10:32:19.3668412Z  * [new branch]                bertmaher/pinbump26     -> origin/bertmaher/pinbump26
2025-12-04T10:32:19.3668626Z  * [new branch]                bertrand/cutlass        -> origin/bertrand/cutlass
2025-12-04T10:32:19.3668831Z  * [new branch]                bf/bug-static-input     -> origin/bf/bug-static-input
2025-12-04T10:32:19.3669034Z  * [new branch]                bf/cg-backend           -> origin/bf/cg-backend
2025-12-04T10:32:19.3669234Z  * [new branch]                bf/cg-nccl-test         -> origin/bf/cg-nccl-test
2025-12-04T10:32:19.3669463Z  * [new branch]                bf/cg-remove-check      -> origin/bf/cg-remove-check
2025-12-04T10:32:19.3669723Z  * [new branch]                bf/clean-torchbench-hf  -> origin/bf/clean-torchbench-hf
2025-12-04T10:32:19.3669947Z  * [new branch]                bf/combo-debug-log      -> origin/bf/combo-debug-log
2025-12-04T10:32:19.3670148Z  * [new branch]                bf/cudagraph            -> origin/bf/cudagraph
2025-12-04T10:32:19.3670415Z  * [new branch]                bf/cudagraph-disable-input-mutation -> origin/bf/cudagraph-disable-input-mutation
2025-12-04T10:32:19.3670832Z  * [new branch]                bf/cudagraph-enable-input-mutation-support-benchmark -> origin/bf/cudagraph-enable-input-mutation-support-benchmark
2025-12-04T10:32:19.3671183Z  * [new branch]                bf/cudagraph-partition  -> origin/bf/cudagraph-partition
2025-12-04T10:32:19.3671405Z  * [new branch]                bf/donated-buffer-bench -> origin/bf/donated-buffer-bench
2025-12-04T10:32:19.3671609Z  * [new branch]                bf/dynamo-partition     -> origin/bf/dynamo-partition
2025-12-04T10:32:19.3671781Z  * [new branch]                bf/lite                 -> origin/bf/lite
2025-12-04T10:32:19.3671962Z  * [new branch]                bf/pa-non-divisible     -> origin/bf/pa-non-divisible
2025-12-04T10:32:19.3672184Z  * [new branch]                bf/partition-cache-free-symbols -> origin/bf/partition-cache-free-symbols
2025-12-04T10:32:19.3672465Z  * [new branch]                bf/partition-memory-plan -> origin/bf/partition-memory-plan
2025-12-04T10:32:19.3672669Z  * [new branch]                bf/partition-move-cpu   -> origin/bf/partition-move-cpu
2025-12-04T10:32:19.3672878Z  * [new branch]                bf/partition-view-fallback -> origin/bf/partition-view-fallback
2025-12-04T10:32:19.3673096Z  * [new branch]                bf/remove-check-55b0c39d -> origin/bf/remove-check-55b0c39d
2025-12-04T10:32:19.3673292Z  * [new branch]                bf/timm-nov-26-2025     -> origin/bf/timm-nov-26-2025
2025-12-04T10:32:19.3673496Z  * [new branch]                bf/transformer-pin-4-57-3 -> origin/bf/transformer-pin-4-57-3
2025-12-04T10:32:19.3673715Z  * [new branch]                bisect_perf_hf_T5_3acc6eac492 -> origin/bisect_perf_hf_T5_3acc6eac492
2025-12-04T10:32:19.3673939Z  * [new branch]                bisect_perf_hf_T5_3fcf66f61fb -> origin/bisect_perf_hf_T5_3fcf66f61fb
2025-12-04T10:32:19.3674158Z  * [new branch]                bisect_perf_hf_T5_4009d154129 -> origin/bisect_perf_hf_T5_4009d154129
2025-12-04T10:32:19.3674368Z  * [new branch]                bisect_perf_hf_T5_40d0740e73d -> origin/bisect_perf_hf_T5_40d0740e73d
2025-12-04T10:32:19.3674573Z  * [new branch]                bisect_perf_hf_T5_5268754e -> origin/bisect_perf_hf_T5_5268754e
2025-12-04T10:32:19.3674779Z  * [new branch]                bisect_perf_hf_T5_7d89a8d385c -> origin/bisect_perf_hf_T5_7d89a8d385c
2025-12-04T10:32:19.3675002Z  * [new branch]                bisect_perf_hf_T5_b7a25c1ee7c -> origin/bisect_perf_hf_T5_b7a25c1ee7c
2025-12-04T10:32:19.3675220Z  * [new branch]                bisect_perf_hf_T5_c25b201583f -> origin/bisect_perf_hf_T5_c25b201583f
2025-12-04T10:32:19.3675429Z  * [new branch]                bisect_perf_hf_T5_c93e57efac0 -> origin/bisect_perf_hf_T5_c93e57efac0
2025-12-04T10:32:19.3675643Z  * [new branch]                bisect_perf_hf_T5_ca9813ea149 -> origin/bisect_perf_hf_T5_ca9813ea149
2025-12-04T10:32:19.3675855Z  * [new branch]                bisect_perf_hf_T5_d65f194a -> origin/bisect_perf_hf_T5_d65f194a
2025-12-04T10:32:19.3676060Z  * [new branch]                bisect_perf_hf_T5_da94ab0b -> origin/bisect_perf_hf_T5_da94ab0b
2025-12-04T10:32:19.3676271Z  * [new branch]                bisect_perf_hf_T5_da94ab0b_new -> origin/bisect_perf_hf_T5_da94ab0b_new
2025-12-04T10:32:19.3676488Z  * [new branch]                bisect_perf_hf_T5_db4e8a1d8a8 -> origin/bisect_perf_hf_T5_db4e8a1d8a8
2025-12-04T10:32:19.3676696Z  * [new branch]                bisect_perf_hf_T5_e0d97e936a2 -> origin/bisect_perf_hf_T5_e0d97e936a2
2025-12-04T10:32:19.3676943Z  * [new branch]                bisect_perf_hf_T5_f23621ec563 -> origin/bisect_perf_hf_T5_f23621ec563
2025-12-04T10:32:19.3677149Z  * [new branch]                brister/fx_device_type  -> origin/brister/fx_device_type
2025-12-04T10:32:19.3677362Z  * [new branch]                brister/test_inductor_all_fx -> origin/brister/test_inductor_all_fx
2025-12-04T10:32:19.3677616Z  * [new branch]                brister/tiled_reduction_no_numel_check -> origin/brister/tiled_reduction_no_numel_check
2025-12-04T10:32:19.3677844Z  * [new branch]                bwd-backup              -> origin/bwd-backup
2025-12-04T10:32:19.3678010Z  * [new branch]                c57382a49               -> origin/c57382a49
2025-12-04T10:32:19.3678174Z  * [new branch]                ca_0431d47eaa           -> origin/ca_0431d47eaa
2025-12-04T10:32:19.3678345Z  * [new branch]                ca_fix_0431d47eaa       -> origin/ca_fix_0431d47eaa
2025-12-04T10:32:19.3678548Z  * [new branch]                camyllh/test_setup_hooks_push -> origin/camyllh/test_setup_hooks_push
2025-12-04T10:32:19.3678758Z  * [new branch]                cccclai-patch-1         -> origin/cccclai-patch-1
2025-12-04T10:32:19.3679000Z  * [new branch]                cherry-pick-159969-by-pytorch_bot_bot_ -> origin/cherry-pick-159969-by-pytorch_bot_bot_
2025-12-04T10:32:19.3679328Z  * [new branch]                cherry-pick-160586-by-pytorch_bot_bot_ -> origin/cherry-pick-160586-by-pytorch_bot_bot_
2025-12-04T10:32:19.3679656Z  * [new branch]                cherry-pick-162208-by-pytorch_bot_bot_ -> origin/cherry-pick-162208-by-pytorch_bot_bot_
2025-12-04T10:32:19.3679937Z  * [new branch]                cherry-pick-163169-by-pytorch_bot_bot_ -> origin/cherry-pick-163169-by-pytorch_bot_bot_
2025-12-04T10:32:19.3680212Z  * [new branch]                cherry-pick-165086-by-pytorch_bot_bot_ -> origin/cherry-pick-165086-by-pytorch_bot_bot_
2025-12-04T10:32:19.3680486Z  * [new branch]                cherry-pick-165514-by-pytorch_bot_bot_ -> origin/cherry-pick-165514-by-pytorch_bot_bot_
2025-12-04T10:32:19.3680768Z  * [new branch]                cherry-pick-165601-by-pytorch_bot_bot_ -> origin/cherry-pick-165601-by-pytorch_bot_bot_
2025-12-04T10:32:19.3681040Z  * [new branch]                cherry-pick-165667-by-pytorch_bot_bot_ -> origin/cherry-pick-165667-by-pytorch_bot_bot_
2025-12-04T10:32:19.3681323Z  * [new branch]                cherry-pick-165815-by-pytorch_bot_bot_ -> origin/cherry-pick-165815-by-pytorch_bot_bot_
2025-12-04T10:32:19.3681600Z  * [new branch]                cherry-pick-165922-by-pytorch_bot_bot_ -> origin/cherry-pick-165922-by-pytorch_bot_bot_
2025-12-04T10:32:19.3681869Z  * [new branch]                cherry-pick-166148-by-pytorch_bot_bot_ -> origin/cherry-pick-166148-by-pytorch_bot_bot_
2025-12-04T10:32:19.3682143Z  * [new branch]                cherry-pick-166181-by-pytorch_bot_bot_ -> origin/cherry-pick-166181-by-pytorch_bot_bot_
2025-12-04T10:32:19.3682421Z  * [new branch]                cherry-pick-166404-by-pytorch_bot_bot_ -> origin/cherry-pick-166404-by-pytorch_bot_bot_
2025-12-04T10:32:19.3682698Z  * [new branch]                cherry-pick-166427-by-pytorch_bot_bot_ -> origin/cherry-pick-166427-by-pytorch_bot_bot_
2025-12-04T10:32:19.3682970Z  * [new branch]                cherry-pick-166480-by-pytorch_bot_bot_ -> origin/cherry-pick-166480-by-pytorch_bot_bot_
2025-12-04T10:32:19.3683249Z  * [new branch]                cherry-pick-166570-by-pytorch_bot_bot_ -> origin/cherry-pick-166570-by-pytorch_bot_bot_
2025-12-04T10:32:19.3683520Z  * [new branch]                cherry-pick-166993-by-pytorch_bot_bot_ -> origin/cherry-pick-166993-by-pytorch_bot_bot_
2025-12-04T10:32:19.3683793Z  * [new branch]                cherry-pick-167111-by-pytorch_bot_bot_ -> origin/cherry-pick-167111-by-pytorch_bot_bot_
2025-12-04T10:32:19.3684066Z  * [new branch]                cherry-pick-167478-by-pytorch_bot_bot_ -> origin/cherry-pick-167478-by-pytorch_bot_bot_
2025-12-04T10:32:19.3684301Z  * [new branch]                cherry_pick_166036_166040 -> origin/cherry_pick_166036_166040
2025-12-04T10:32:19.3684540Z  * [new branch]                cherry_pick_166457      -> origin/cherry_pick_166457
2025-12-04T10:32:19.3684752Z  * [new branch]                cherrypick_166338       -> origin/cherrypick_166338
2025-12-04T10:32:19.3684934Z  * [new branch]                cherrypick_166458       -> origin/cherrypick_166458
2025-12-04T10:32:19.3685114Z  * [new branch]                cherrypick_166586       -> origin/cherrypick_166586
2025-12-04T10:32:19.3685298Z  * [new branch]                cherrypick_166956       -> origin/cherrypick_166956
2025-12-04T10:32:19.3685470Z  * [new branch]                ci_attn                 -> origin/ci_attn
2025-12-04T10:32:19.3685635Z  * [new branch]                codex-testing           -> origin/codex-testing
2025-12-04T10:32:19.3685897Z  * [new branch]                codex/add-check_memory_overlap-helper-functions -> origin/codex/add-check_memory_overlap-helper-functions
2025-12-04T10:32:19.3686205Z  * [new branch]                codex/fix-issue-121219-in-pytorch -> origin/codex/fix-issue-121219-in-pytorch
2025-12-04T10:32:19.3686518Z  * [new branch]                codex/investigate-segfaults-in-get_tensor_storage_id -> origin/codex/investigate-segfaults-in-get_tensor_storage_id
2025-12-04T10:32:19.3686887Z  * [new branch]                codex/refactor-lintrunner-config-to-use-uv-run -> origin/codex/refactor-lintrunner-config-to-use-uv-run
2025-12-04T10:32:19.3687196Z  * [new branch]                compatiblpy39util       -> origin/compatiblpy39util
2025-12-04T10:32:19.3687377Z  * [new branch]                cond_hop_device         -> origin/cond_hop_device
2025-12-04T10:32:19.3687552Z  * [new branch]                context_test            -> origin/context_test
2025-12-04T10:32:19.3687789Z  * [new branch]                copilot/code-style-cleanup-python-pip -> origin/copilot/code-style-cleanup-python-pip
2025-12-04T10:32:19.3688033Z  * [new branch]                cpio/fix_new_ami_tests  -> origin/cpio/fix_new_ami_tests
2025-12-04T10:32:19.3688256Z  * [new branch]                cpp-docs-dependency-upgrade -> origin/cpp-docs-dependency-upgrade
2025-12-04T10:32:19.3688509Z  * [new branch]                crpa/typo-in-inductor_comm_lowering -> origin/crpa/typo-in-inductor_comm_lowering
2025-12-04T10:32:19.3688734Z  * [new branch]                csl/always_produce_xml  -> origin/csl/always_produce_xml
2025-12-04T10:32:19.3688949Z  * [new branch]                csl/build_test_more_procs -> origin/csl/build_test_more_procs
2025-12-04T10:32:19.3689160Z  * [new branch]                csl/build_test_more_procs2 -> origin/csl/build_test_more_procs2
2025-12-04T10:32:19.3689347Z  * [new branch]                csl/clean_up            -> origin/csl/clean_up
2025-12-04T10:32:19.3689542Z  * [new branch]                csl/fix_retry_segfault_exit -> origin/csl/fix_retry_segfault_exit
2025-12-04T10:32:19.3689766Z  * [new branch]                csl/katex               -> origin/csl/katex
2025-12-04T10:32:19.3689935Z  * [new branch]                csl/larger_runner       -> origin/csl/larger_runner
2025-12-04T10:32:19.3690121Z  * [new branch]                csl/lint_testing        -> origin/csl/lint_testing
2025-12-04T10:32:19.3690293Z  * [new branch]                csl/lint_thing          -> origin/csl/lint_thing
2025-12-04T10:32:19.3690478Z  * [new branch]                csl/lintrunner_stuff    -> origin/csl/lintrunner_stuff
2025-12-04T10:32:19.3690674Z  * [new branch]                csl/manually_gen_json   -> origin/csl/manually_gen_json
2025-12-04T10:32:19.3690854Z  * [new branch]                csl/mps_sharding        -> origin/csl/mps_sharding
2025-12-04T10:32:19.3691044Z  * [new branch]                csl/multistage_docker   -> origin/csl/multistage_docker
2025-12-04T10:32:19.3691231Z  * [new branch]                csl/print_timing        -> origin/csl/print_timing
2025-12-04T10:32:19.3691411Z  * [new branch]                csl/remove_experiment   -> origin/csl/remove_experiment
2025-12-04T10:32:19.3691653Z  * [new branch]                csl/remove_maybe_unused_var -> origin/csl/remove_maybe_unused_var
2025-12-04T10:32:19.3691888Z  * [new branch]                csl/remove_repo_specific_autolabel -> origin/csl/remove_repo_specific_autolabel
2025-12-04T10:32:19.3692115Z  * [new branch]                csl/remove_run_parallel -> origin/csl/remove_run_parallel
2025-12-04T10:32:19.3692311Z  * [new branch]                csl/remove_unused_vars  -> origin/csl/remove_unused_vars
2025-12-04T10:32:19.3692501Z  * [new branch]                csl/revert_open         -> origin/csl/revert_open
2025-12-04T10:32:19.3692675Z  * [new branch]                csl/skip_build          -> origin/csl/skip_build
2025-12-04T10:32:19.3692875Z  * [new branch]                csl/smaller_avx_amx_runenrs -> origin/csl/smaller_avx_amx_runenrs
2025-12-04T10:32:19.3693070Z  * [new branch]                csl/td_job_level        -> origin/csl/td_job_level
2025-12-04T10:32:19.3693274Z  * [new branch]                csl/test_cuda_build_large_runner -> origin/csl/test_cuda_build_large_runner
2025-12-04T10:32:19.3693526Z  * [new branch]                csl/test_owners_autograd_dispatch_nn -> origin/csl/test_owners_autograd_dispatch_nn
2025-12-04T10:32:19.3693778Z  * [new branch]                csl/test_owners_higher_confidence -> origin/csl/test_owners_higher_confidence
2025-12-04T10:32:19.3693996Z  * [new branch]                csl/upload_json_running -> origin/csl/upload_json_running
2025-12-04T10:32:19.3694223Z  * [new branch]                csl/win_sccache         -> origin/csl/win_sccache
2025-12-04T10:32:19.3694395Z  * [new branch]                csl/xml_stuff           -> origin/csl/xml_stuff
2025-12-04T10:32:19.3694564Z  * [new branch]                cublasrelax2            -> origin/cublasrelax2
2025-12-04T10:32:19.3694736Z  * [new branch]                cuda_mempool            -> origin/cuda_mempool
2025-12-04T10:32:19.3694920Z  * [new branch]                custom_lowering_dict    -> origin/custom_lowering_dict
2025-12-04T10:32:19.3695115Z  * [new branch]                d4l3k/debug_plane_frtrace -> origin/d4l3k/debug_plane_frtrace
2025-12-04T10:32:19.3695305Z  * [new branch]                daxia6/2.8o3            -> origin/daxia6/2.8o3
2025-12-04T10:32:19.3695476Z  * [new branch]                debug-guard             -> origin/debug-guard
2025-12-04T10:32:19.3695657Z  * [new branch]                delete-quant-docs       -> origin/delete-quant-docs
2025-12-04T10:32:19.3695991Z  * [new branch]                dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.57.0 -> origin/dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.57.0
2025-12-04T10:32:19.3696456Z  * [new branch]                dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.57.1 -> origin/dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.57.1
2025-12-04T10:32:19.3696791Z  * [new branch]                desertfire/test_cpp_wrapper -> origin/desertfire/test_cpp_wrapper
2025-12-04T10:32:19.3697040Z  * [new branch]                desertfire/triton-cpu-for-aarch64 -> origin/desertfire/triton-cpu-for-aarch64
2025-12-04T10:32:19.3697277Z  * [new branch]                dev/dhruva/flex_attn_opt -> origin/dev/dhruva/flex_attn_opt
2025-12-04T10:32:19.3697479Z  * [new branch]                dev/joona/MPSNDArrayAdd -> origin/dev/joona/MPSNDArrayAdd
2025-12-04T10:32:19.3697676Z  * [new branch]                dev/joona/Unranked      -> origin/dev/joona/Unranked
2025-12-04T10:32:19.3697858Z  * [new branch]                dev/joona/cat           -> origin/dev/joona/cat
2025-12-04T10:32:19.3698043Z  * [new branch]                dev/joona/embeddingbag  -> origin/dev/joona/embeddingbag
2025-12-04T10:32:19.3698249Z  * [new branch]                dev/joona/fix_sdpa_memtest -> origin/dev/joona/fix_sdpa_memtest
2025-12-04T10:32:19.3698467Z  * [new branch]                dev/joona/getTensorsString -> origin/dev/joona/getTensorsString
2025-12-04T10:32:19.3698693Z  * [new branch]                dev/joona/mps_linear_macos14 -> origin/dev/joona/mps_linear_macos14
2025-12-04T10:32:19.3698935Z  * [new branch]                dev/joona/scalar_clamp  -> origin/dev/joona/scalar_clamp
2025-12-04T10:32:19.3699117Z  * [new branch]                dev/joona/sdpa          -> origin/dev/joona/sdpa
2025-12-04T10:32:19.3699299Z  * [new branch]                dev/joona/sdpa_api      -> origin/dev/joona/sdpa_api
2025-12-04T10:32:19.3699484Z  * [new branch]                dev/joona/type_inf      -> origin/dev/joona/type_inf
2025-12-04T10:32:19.3699783Z  * [new branch]                dev/joona/ulpAssertClose -> origin/dev/joona/ulpAssertClose
2025-12-04T10:32:19.3699980Z  * [new branch]                dev/joona/upsize3d      -> origin/dev/joona/upsize3d
2025-12-04T10:32:19.3700161Z  * [new branch]                disp_counter            -> origin/disp_counter
2025-12-04T10:32:19.3700339Z  * [new branch]                divyanshk-patch-1       -> origin/divyanshk-patch-1
2025-12-04T10:32:19.3700516Z  * [new branch]                docs                    -> origin/docs
2025-12-04T10:32:19.3700688Z  * [new branch]                documentation           -> origin/documentation
2025-12-04T10:32:19.3700872Z  * [new branch]                eager_model_benchmarks  -> origin/eager_model_benchmarks
2025-12-04T10:32:19.3701084Z  * [new branch]                embg/test_inductor_ci_control -> origin/embg/test_inductor_ci_control
2025-12-04T10:32:19.3701309Z  * [new branch]                embg/triton_l2_prefetch_128B -> origin/embg/triton_l2_prefetch_128B
2025-12-04T10:32:19.3701566Z  * [new branch]                embg/triton_l2_prefetch_256B -> origin/embg/triton_l2_prefetch_256B
2025-12-04T10:32:19.3701765Z  * [new branch]                eqy-patch-1             -> origin/eqy-patch-1
2025-12-04T10:32:19.3701939Z  * [new branch]                eqy-patch-2             -> origin/eqy-patch-2
2025-12-04T10:32:19.3702105Z  * [new branch]                eqy-patch-3             -> origin/eqy-patch-3
2025-12-04T10:32:19.3702270Z  * [new branch]                eqy-patch-4             -> origin/eqy-patch-4
2025-12-04T10:32:19.3702439Z  * [new branch]                eqy-patch-5             -> origin/eqy-patch-5
2025-12-04T10:32:19.3702603Z  * [new branch]                eqy-patch-6             -> origin/eqy-patch-6
2025-12-04T10:32:19.3702788Z  * [new branch]                exclamaforte/amd-ma     -> origin/exclamaforte/amd-ma
2025-12-04T10:32:19.3703019Z  * [new branch]                exclamaforte/combo-kernels-perf-run -> origin/exclamaforte/combo-kernels-perf-run
2025-12-04T10:32:19.3703278Z  * [new branch]                exclamaforte/do_bench_refactor -> origin/exclamaforte/do_bench_refactor
2025-12-04T10:32:19.3703534Z  * [new branch]                exclamaforte/enable-mem-dep-fusion -> origin/exclamaforte/enable-mem-dep-fusion
2025-12-04T10:32:19.3703812Z  * [new branch]                exclamaforte/fix-exhaustive-autotuning -> origin/exclamaforte/fix-exhaustive-autotuning
2025-12-04T10:32:19.3704105Z  * [new branch]                exclamaforte/fix-trace-parsing-fx-svg -> origin/exclamaforte/fix-trace-parsing-fx-svg
2025-12-04T10:32:19.3704413Z  * [new branch]                exclamaforte/force-pointwise-cat-perf-run -> origin/exclamaforte/force-pointwise-cat-perf-run
2025-12-04T10:32:19.3704677Z  * [new branch]                exclamaforte/fusion-data -> origin/exclamaforte/fusion-data
2025-12-04T10:32:19.3704908Z  * [new branch]                exclamaforte/gemm-benchmark-run -> origin/exclamaforte/gemm-benchmark-run
2025-12-04T10:32:19.3705160Z  * [new branch]                exclamaforte/gemm-export-model -> origin/exclamaforte/gemm-export-model
2025-12-04T10:32:19.3705380Z  * [new branch]                exclamaforte/gemm-model -> origin/exclamaforte/gemm-model
2025-12-04T10:32:19.3705650Z  * [new branch]                exclamaforte/gemm-model-all-data-collection -> origin/exclamaforte/gemm-model-all-data-collection
2025-12-04T10:32:19.3705920Z  * [new branch]                exclamaforte/gemm-to-amd -> origin/exclamaforte/gemm-to-amd
2025-12-04T10:32:19.3706140Z  * [new branch]                exclamaforte/just-gemm-model -> origin/exclamaforte/just-gemm-model
2025-12-04T10:32:19.3706458Z  * [new branch]                exclamaforte/just-gemm-model-no-refactor -> origin/exclamaforte/just-gemm-model-no-refactor
2025-12-04T10:32:19.3706734Z  * [new branch]                exclamaforte/profile-diff-algo -> origin/exclamaforte/profile-diff-algo
2025-12-04T10:32:19.3706993Z  * [new branch]                exclamaforte/profiler-visualization -> origin/exclamaforte/profiler-visualization
2025-12-04T10:32:19.3707260Z  * [new branch]                exclamaforte/test_cpp_wrapper_mode -> origin/exclamaforte/test_cpp_wrapper_mode
2025-12-04T10:32:19.3707532Z  * [new branch]                exclamaforte/update-autotune-configs -> origin/exclamaforte/update-autotune-configs
2025-12-04T10:32:19.3707816Z  * [new branch]                exclamaforte/update-autotune-configs-2 -> origin/exclamaforte/update-autotune-configs-2
2025-12-04T10:32:19.3708050Z  * [new branch]                exec                    -> origin/exec
2025-12-04T10:32:19.3708235Z  * [new branch]                experimental-mosaic     -> origin/experimental-mosaic
2025-12-04T10:32:19.3708428Z  * [new branch]                export-D61047529        -> origin/export-D61047529
2025-12-04T10:32:19.3708674Z  * [new branch]                export-D71412006        -> origin/export-D71412006
2025-12-04T10:32:19.3708857Z  * [new branch]                export-D73042989        -> origin/export-D73042989
2025-12-04T10:32:19.3709063Z  * [new branch]                export-D78957093        -> origin/export-D78957093
2025-12-04T10:32:19.3709239Z  * [new branch]                export-D78996107        -> origin/export-D78996107
2025-12-04T10:32:19.3709412Z  * [new branch]                export-D80823877        -> origin/export-D80823877
2025-12-04T10:32:19.3709646Z  * [new branch]                export-D80958642        -> origin/export-D80958642
2025-12-04T10:32:19.3709824Z  * [new branch]                export-D81054193        -> origin/export-D81054193
2025-12-04T10:32:19.3710001Z  * [new branch]                export-D81204584        -> origin/export-D81204584
2025-12-04T10:32:19.3710180Z  * [new branch]                export-D81429090        -> origin/export-D81429090
2025-12-04T10:32:19.3710353Z  * [new branch]                export-D82250826        -> origin/export-D82250826
2025-12-04T10:32:19.3710527Z  * [new branch]                export-D82253817        -> origin/export-D82253817
2025-12-04T10:32:19.3710700Z  * [new branch]                export-D83541846        -> origin/export-D83541846
2025-12-04T10:32:19.3710872Z  * [new branch]                export-D83627170        -> origin/export-D83627170
2025-12-04T10:32:19.3711042Z  * [new branch]                export-D83766701        -> origin/export-D83766701
2025-12-04T10:32:19.3711210Z  * [new branch]                export-D83768878        -> origin/export-D83768878
2025-12-04T10:32:19.3711385Z  * [new branch]                export-D83769447        -> origin/export-D83769447
2025-12-04T10:32:19.3711553Z  * [new branch]                export-D84089824        -> origin/export-D84089824
2025-12-04T10:32:19.3711726Z  * [new branch]                export-D84213020        -> origin/export-D84213020
2025-12-04T10:32:19.3711897Z  * [new branch]                export-D84373821        -> origin/export-D84373821
2025-12-04T10:32:19.3712070Z  * [new branch]                export-D84612194        -> origin/export-D84612194
2025-12-04T10:32:19.3712246Z  * [new branch]                export-D84890985        -> origin/export-D84890985
2025-12-04T10:32:19.3712418Z  * [new branch]                export-D85122326        -> origin/export-D85122326
2025-12-04T10:32:19.3712589Z  * [new branch]                export-D86256198        -> origin/export-D86256198
2025-12-04T10:32:19.3712767Z  * [new branch]                export-D86460608        -> origin/export-D86460608
2025-12-04T10:32:19.3712940Z  * [new branch]                export-D86474796        -> origin/export-D86474796
2025-12-04T10:32:19.3713107Z  * [new branch]                export-D86712396        -> origin/export-D86712396
2025-12-04T10:32:19.3713329Z  * [new branch]                export-D87022129        -> origin/export-D87022129
2025-12-04T10:32:19.3713502Z  * [new branch]                export-D87838959        -> origin/export-D87838959
2025-12-04T10:32:19.3713674Z  * [new branch]                export-D88319437        -> origin/export-D88319437
2025-12-04T10:32:19.3713902Z  * [new branch]                exported-model-train-idempotent -> origin/exported-model-train-idempotent
2025-12-04T10:32:19.3714142Z  * [new branch]                ezyang-titan-october    -> origin/ezyang-titan-october
2025-12-04T10:32:19.3714338Z  * [new branch]                ezyang-titan-october2   -> origin/ezyang-titan-october2
2025-12-04T10:32:19.3714524Z  * [new branch]                ezyang-war              -> origin/ezyang-war
2025-12-04T10:32:19.3714725Z  * [new branch]                ezyang/wip-aot-descriptors -> origin/ezyang/wip-aot-descriptors
2025-12-04T10:32:19.3714920Z  * [new branch]                fa_u8_brgemm            -> origin/fa_u8_brgemm
2025-12-04T10:32:19.3715117Z  * [new branch]                fadeputr/sequence_fbgemm -> origin/fadeputr/sequence_fbgemm
2025-12-04T10:32:19.3715320Z  * [new branch]                fastmath_baseline       -> origin/fastmath_baseline
2025-12-04T10:32:19.3715492Z  * [new branch]                fbcode/warm             -> origin/fbcode/warm
2025-12-04T10:32:19.3715657Z  * [new branch]                fca                     -> origin/fca
2025-12-04T10:32:19.3715852Z  * [new branch]                fca2_ca5984c            -> origin/fca2_ca5984c
2025-12-04T10:32:19.3716205Z  * [new branch]                fca5                    -> origin/fca5
2025-12-04T10:32:19.3716384Z  * [new branch]                feature/justknobs-cpp   -> origin/feature/justknobs-cpp
2025-12-04T10:32:19.3716581Z  * [new branch]                feature/numa-forkserver -> origin/feature/numa-forkserver
2025-12-04T10:32:19.3716776Z  * [new branch]                ffast_math_baseline     -> origin/ffast_math_baseline
2025-12-04T10:32:19.3716969Z  * [new branch]                ffast_math_target       -> origin/ffast_math_target
2025-12-04T10:32:19.3717151Z  * [new branch]                findhao/base_commit     -> origin/findhao/base_commit
2025-12-04T10:32:19.3717342Z  * [new branch]                findhao/base_commit1    -> origin/findhao/base_commit1
2025-12-04T10:32:19.3717535Z  * [new branch]                findhao/multistream2    -> origin/findhao/multistream2
2025-12-04T10:32:19.3717726Z  * [new branch]                findhao/multistream5    -> origin/findhao/multistream5
2025-12-04T10:32:19.3717920Z  * [new branch]                findhao/multistream6    -> origin/findhao/multistream6
2025-12-04T10:32:19.3718117Z  * [new branch]                findhao/operatorbench3  -> origin/findhao/operatorbench3
2025-12-04T10:32:19.3718321Z  * [new branch]                findhao/operatorbench5  -> origin/findhao/operatorbench5
2025-12-04T10:32:19.3718519Z  * [new branch]                findhao/tritonparse     -> origin/findhao/tritonparse
2025-12-04T10:32:19.3718740Z  * [new branch]                fix-ck-gemm-template-format -> origin/fix-ck-gemm-template-format
2025-12-04T10:32:19.3718947Z  * [new branch]                fix-config-ignore       -> origin/fix-config-ignore
2025-12-04T10:32:19.3719129Z  * [new branch]                fix-dict-guard          -> origin/fix-dict-guard
2025-12-04T10:32:19.3719305Z  * [new branch]                fix_addmm_issue         -> origin/fix_addmm_issue
2025-12-04T10:32:19.3719505Z  * [new branch]                fix_amd_missing_cluster_dims -> origin/fix_amd_missing_cluster_dims
2025-12-04T10:32:19.3719754Z  * [new branch]                fix_bench_bwd_pass      -> origin/fix_bench_bwd_pass
2025-12-04T10:32:19.3719944Z  * [new branch]                fix_mem_profiler_config -> origin/fix_mem_profiler_config
2025-12-04T10:32:19.3720241Z  * [new branch]                fix_nvrtc_discovery     -> origin/fix_nvrtc_discovery
2025-12-04T10:32:19.3720419Z  * [new branch]                fix_op_runner           -> origin/fix_op_runner
2025-12-04T10:32:19.3720641Z  * [new branch]                fix_ubn_159469          -> origin/fix_ubn_159469
2025-12-04T10:32:19.3720814Z  * [new branch]                fixes-triage            -> origin/fixes-triage
2025-12-04T10:32:19.3720987Z  * [new branch]                fixflashinfer           -> origin/fixflashinfer
2025-12-04T10:32:19.3721164Z  * [new branch]                flash_decoding_cpu      -> origin/flash_decoding_cpu
2025-12-04T10:32:19.3721351Z  * [new branch]                flex-flash              -> origin/flex-flash
2025-12-04T10:32:19.3721560Z  * [new branch]                flex_attention_functorch_grad -> origin/flex_attention_functorch_grad
2025-12-04T10:32:19.3721755Z  * [new branch]                flex_flash              -> origin/flex_flash
2025-12-04T10:32:19.3721961Z  * [new branch]                fmassa/fix_memeff_sharding_rule -> origin/fmassa/fix_memeff_sharding_rule
2025-12-04T10:32:19.3722209Z  * [new branch]                fmassa/tests_comm_compute_scheduler -> origin/fmassa/tests_comm_compute_scheduler
2025-12-04T10:32:19.3722430Z  * [new branch]                forkserver_fix          -> origin/forkserver_fix
2025-12-04T10:32:19.3722610Z  * [new branch]                fsdp2_trace_rules       -> origin/fsdp2_trace_rules
2025-12-04T10:32:19.3722784Z  * [new branch]                fx_cpp                  -> origin/fx_cpp
2025-12-04T10:32:19.3722946Z  * [new branch]                fy/fix-win              -> origin/fy/fix-win
2025-12-04T10:32:19.3723156Z  * [new branch]                galv-patch-1            -> origin/galv-patch-1
2025-12-04T10:32:19.3723388Z  * [new branch]                galv/cudagraphs-conditional-nodes-4 -> origin/galv/cudagraphs-conditional-nodes-4
2025-12-04T10:32:19.3723645Z  * [new branch]                georgehong/cmakelists-patch -> origin/georgehong/cmakelists-patch
2025-12-04T10:32:19.3723859Z  * [new branch]                gh/AlnisM/1/base        -> origin/gh/AlnisM/1/base
2025-12-04T10:32:19.3724042Z  * [new branch]                gh/AlnisM/1/head        -> origin/gh/AlnisM/1/head
2025-12-04T10:32:19.3724228Z  * [new branch]                gh/EikanWang/67/base    -> origin/gh/EikanWang/67/base
2025-12-04T10:32:19.3724430Z  * [new branch]                gh/EikanWang/67/head    -> origin/gh/EikanWang/67/head
2025-12-04T10:32:19.3724622Z  * [new branch]                gh/Gasoonjia/1/base     -> origin/gh/Gasoonjia/1/base
2025-12-04T10:32:19.3724813Z  * [new branch]                gh/Gasoonjia/1/head     -> origin/gh/Gasoonjia/1/head
2025-12-04T10:32:19.3725001Z  * [new branch]                gh/H-Huang/131/base     -> origin/gh/H-Huang/131/base
2025-12-04T10:32:19.3725185Z  * [new branch]                gh/H-Huang/131/head     -> origin/gh/H-Huang/131/head
2025-12-04T10:32:19.3725366Z  * [new branch]                gh/H-Huang/131/orig     -> origin/gh/H-Huang/131/orig
2025-12-04T10:32:19.3725547Z  * [new branch]                gh/H-Huang/132/base     -> origin/gh/H-Huang/132/base
2025-12-04T10:32:19.3725723Z  * [new branch]                gh/H-Huang/132/head     -> origin/gh/H-Huang/132/head
2025-12-04T10:32:19.3725906Z  * [new branch]                gh/H-Huang/132/orig     -> origin/gh/H-Huang/132/orig
2025-12-04T10:32:19.3726089Z  * [new branch]                gh/H-Huang/180/base     -> origin/gh/H-Huang/180/base
2025-12-04T10:32:19.3726265Z  * [new branch]                gh/H-Huang/180/head     -> origin/gh/H-Huang/180/head
2025-12-04T10:32:19.3726449Z  * [new branch]                gh/H-Huang/180/orig     -> origin/gh/H-Huang/180/orig
2025-12-04T10:32:19.3726629Z  * [new branch]                gh/H-Huang/182/base     -> origin/gh/H-Huang/182/base
2025-12-04T10:32:19.3726805Z  * [new branch]                gh/H-Huang/182/head     -> origin/gh/H-Huang/182/head
2025-12-04T10:32:19.3726989Z  * [new branch]                gh/H-Huang/182/orig     -> origin/gh/H-Huang/182/orig
2025-12-04T10:32:19.3727169Z  * [new branch]                gh/H-Huang/226/base     -> origin/gh/H-Huang/226/base
2025-12-04T10:32:19.3727347Z  * [new branch]                gh/H-Huang/226/head     -> origin/gh/H-Huang/226/head
2025-12-04T10:32:19.3727556Z  * [new branch]                gh/H-Huang/226/orig     -> origin/gh/H-Huang/226/orig
2025-12-04T10:32:19.3727736Z  * [new branch]                gh/H-Huang/228/base     -> origin/gh/H-Huang/228/base
2025-12-04T10:32:19.3727914Z  * [new branch]                gh/H-Huang/228/head     -> origin/gh/H-Huang/228/head
2025-12-04T10:32:19.3728098Z  * [new branch]                gh/H-Huang/228/orig     -> origin/gh/H-Huang/228/orig
2025-12-04T10:32:19.3728293Z  * [new branch]                gh/IvanKobzarev/150/base -> origin/gh/IvanKobzarev/150/base
2025-12-04T10:32:19.3728495Z  * [new branch]                gh/IvanKobzarev/150/head -> origin/gh/IvanKobzarev/150/head
2025-12-04T10:32:19.3728699Z  * [new branch]                gh/IvanKobzarev/150/orig -> origin/gh/IvanKobzarev/150/orig
2025-12-04T10:32:19.3728906Z  * [new branch]                gh/IvanKobzarev/157/base -> origin/gh/IvanKobzarev/157/base
2025-12-04T10:32:19.3729104Z  * [new branch]                gh/IvanKobzarev/157/head -> origin/gh/IvanKobzarev/157/head
2025-12-04T10:32:19.3729309Z  * [new branch]                gh/IvanKobzarev/157/orig -> origin/gh/IvanKobzarev/157/orig
2025-12-04T10:32:19.3729506Z  * [new branch]                gh/IvanKobzarev/159/base -> origin/gh/IvanKobzarev/159/base
2025-12-04T10:32:19.3729759Z  * [new branch]                gh/IvanKobzarev/159/head -> origin/gh/IvanKobzarev/159/head
2025-12-04T10:32:19.3729995Z  * [new branch]                gh/IvanKobzarev/159/orig -> origin/gh/IvanKobzarev/159/orig
2025-12-04T10:32:19.3730193Z  * [new branch]                gh/IvanKobzarev/162/base -> origin/gh/IvanKobzarev/162/base
2025-12-04T10:32:19.3730395Z  * [new branch]                gh/IvanKobzarev/162/head -> origin/gh/IvanKobzarev/162/head
2025-12-04T10:32:19.3730597Z  * [new branch]                gh/IvanKobzarev/162/orig -> origin/gh/IvanKobzarev/162/orig
2025-12-04T10:32:19.3730800Z  * [new branch]                gh/IvanKobzarev/163/base -> origin/gh/IvanKobzarev/163/base
2025-12-04T10:32:19.3731007Z  * [new branch]                gh/IvanKobzarev/163/head -> origin/gh/IvanKobzarev/163/head
2025-12-04T10:32:19.3731208Z  * [new branch]                gh/IvanKobzarev/163/orig -> origin/gh/IvanKobzarev/163/orig
2025-12-04T10:32:19.3731405Z  * [new branch]                gh/IvanKobzarev/166/base -> origin/gh/IvanKobzarev/166/base
2025-12-04T10:32:19.3731614Z  * [new branch]                gh/IvanKobzarev/166/head -> origin/gh/IvanKobzarev/166/head
2025-12-04T10:32:19.3731816Z  * [new branch]                gh/IvanKobzarev/166/orig -> origin/gh/IvanKobzarev/166/orig
2025-12-04T10:32:19.3732013Z  * [new branch]                gh/IvanKobzarev/167/base -> origin/gh/IvanKobzarev/167/base
2025-12-04T10:32:19.3732214Z  * [new branch]                gh/IvanKobzarev/167/head -> origin/gh/IvanKobzarev/167/head
2025-12-04T10:32:19.3732420Z  * [new branch]                gh/IvanKobzarev/167/orig -> origin/gh/IvanKobzarev/167/orig
2025-12-04T10:32:19.3732618Z  * [new branch]                gh/IvanKobzarev/168/base -> origin/gh/IvanKobzarev/168/base
2025-12-04T10:32:19.3732824Z  * [new branch]                gh/IvanKobzarev/168/head -> origin/gh/IvanKobzarev/168/head
2025-12-04T10:32:19.3733027Z  * [new branch]                gh/IvanKobzarev/168/orig -> origin/gh/IvanKobzarev/168/orig
2025-12-04T10:32:19.3733229Z  * [new branch]                gh/IvanKobzarev/169/base -> origin/gh/IvanKobzarev/169/base
2025-12-04T10:32:19.3733433Z  * [new branch]                gh/IvanKobzarev/169/head -> origin/gh/IvanKobzarev/169/head
2025-12-04T10:32:19.3733635Z  * [new branch]                gh/IvanKobzarev/169/orig -> origin/gh/IvanKobzarev/169/orig
2025-12-04T10:32:19.3733833Z  * [new branch]                gh/IvanKobzarev/170/base -> origin/gh/IvanKobzarev/170/base
2025-12-04T10:32:19.3734055Z  * [new branch]                gh/IvanKobzarev/170/head -> origin/gh/IvanKobzarev/170/head
2025-12-04T10:32:19.3734257Z  * [new branch]                gh/IvanKobzarev/170/orig -> origin/gh/IvanKobzarev/170/orig
2025-12-04T10:32:19.3734503Z  * [new branch]                gh/IvanKobzarev/171/base -> origin/gh/IvanKobzarev/171/base
2025-12-04T10:32:19.3734705Z  * [new branch]                gh/IvanKobzarev/171/head -> origin/gh/IvanKobzarev/171/head
2025-12-04T10:32:19.3734911Z  * [new branch]                gh/IvanKobzarev/171/orig -> origin/gh/IvanKobzarev/171/orig
2025-12-04T10:32:19.3735111Z  * [new branch]                gh/IvanKobzarev/172/base -> origin/gh/IvanKobzarev/172/base
2025-12-04T10:32:19.3735317Z  * [new branch]                gh/IvanKobzarev/172/head -> origin/gh/IvanKobzarev/172/head
2025-12-04T10:32:19.3735522Z  * [new branch]                gh/IvanKobzarev/172/orig -> origin/gh/IvanKobzarev/172/orig
2025-12-04T10:32:19.3735723Z  * [new branch]                gh/IvanKobzarev/173/base -> origin/gh/IvanKobzarev/173/base
2025-12-04T10:32:19.3735926Z  * [new branch]                gh/IvanKobzarev/173/head -> origin/gh/IvanKobzarev/173/head
2025-12-04T10:32:19.3736124Z  * [new branch]                gh/IvanKobzarev/173/orig -> origin/gh/IvanKobzarev/173/orig
2025-12-04T10:32:19.3736329Z  * [new branch]                gh/IvanKobzarev/174/base -> origin/gh/IvanKobzarev/174/base
2025-12-04T10:32:19.3736535Z  * [new branch]                gh/IvanKobzarev/174/head -> origin/gh/IvanKobzarev/174/head
2025-12-04T10:32:19.3736733Z  * [new branch]                gh/IvanKobzarev/174/orig -> origin/gh/IvanKobzarev/174/orig
2025-12-04T10:32:19.3736959Z  * [new branch]                gh/IvanKobzarev/175/base -> origin/gh/IvanKobzarev/175/base
2025-12-04T10:32:19.3737161Z  * [new branch]                gh/IvanKobzarev/175/head -> origin/gh/IvanKobzarev/175/head
2025-12-04T10:32:19.3737358Z  * [new branch]                gh/IvanKobzarev/175/orig -> origin/gh/IvanKobzarev/175/orig
2025-12-04T10:32:19.3737565Z  * [new branch]                gh/IvanKobzarev/176/base -> origin/gh/IvanKobzarev/176/base
2025-12-04T10:32:19.3737766Z  * [new branch]                gh/IvanKobzarev/176/head -> origin/gh/IvanKobzarev/176/head
2025-12-04T10:32:19.3737965Z  * [new branch]                gh/IvanKobzarev/176/orig -> origin/gh/IvanKobzarev/176/orig
2025-12-04T10:32:19.3738169Z  * [new branch]                gh/IvanKobzarev/177/base -> origin/gh/IvanKobzarev/177/base
2025-12-04T10:32:19.3738371Z  * [new branch]                gh/IvanKobzarev/177/head -> origin/gh/IvanKobzarev/177/head
2025-12-04T10:32:19.3738571Z  * [new branch]                gh/IvanKobzarev/177/orig -> origin/gh/IvanKobzarev/177/orig
2025-12-04T10:32:19.3738778Z  * [new branch]                gh/IvanKobzarev/178/base -> origin/gh/IvanKobzarev/178/base
2025-12-04T10:32:19.3738982Z  * [new branch]                gh/IvanKobzarev/178/head -> origin/gh/IvanKobzarev/178/head
2025-12-04T10:32:19.3739179Z  * [new branch]                gh/IvanKobzarev/178/orig -> origin/gh/IvanKobzarev/178/orig
2025-12-04T10:32:19.3739384Z  * [new branch]                gh/IvanKobzarev/179/base -> origin/gh/IvanKobzarev/179/base
2025-12-04T10:32:19.3739646Z  * [new branch]                gh/IvanKobzarev/179/head -> origin/gh/IvanKobzarev/179/head
2025-12-04T10:32:19.3739846Z  * [new branch]                gh/IvanKobzarev/179/orig -> origin/gh/IvanKobzarev/179/orig
2025-12-04T10:32:19.3740050Z  * [new branch]                gh/IvanKobzarev/180/base -> origin/gh/IvanKobzarev/180/base
2025-12-04T10:32:19.3740252Z  * [new branch]                gh/IvanKobzarev/180/head -> origin/gh/IvanKobzarev/180/head
2025-12-04T10:32:19.3740456Z  * [new branch]                gh/IvanKobzarev/180/orig -> origin/gh/IvanKobzarev/180/orig
2025-12-04T10:32:19.3740658Z  * [new branch]                gh/IvanKobzarev/181/base -> origin/gh/IvanKobzarev/181/base
2025-12-04T10:32:19.3740859Z  * [new branch]                gh/IvanKobzarev/181/head -> origin/gh/IvanKobzarev/181/head
2025-12-04T10:32:19.3741060Z  * [new branch]                gh/IvanKobzarev/181/orig -> origin/gh/IvanKobzarev/181/orig
2025-12-04T10:32:19.3741262Z  * [new branch]                gh/IvanKobzarev/182/base -> origin/gh/IvanKobzarev/182/base
2025-12-04T10:32:19.3741512Z  * [new branch]                gh/IvanKobzarev/182/head -> origin/gh/IvanKobzarev/182/head
2025-12-04T10:32:19.3741709Z  * [new branch]                gh/IvanKobzarev/182/orig -> origin/gh/IvanKobzarev/182/orig
2025-12-04T10:32:19.3741910Z  * [new branch]                gh/IvanKobzarev/183/base -> origin/gh/IvanKobzarev/183/base
2025-12-04T10:32:19.3742112Z  * [new branch]                gh/IvanKobzarev/183/head -> origin/gh/IvanKobzarev/183/head
2025-12-04T10:32:19.3742316Z  * [new branch]                gh/IvanKobzarev/183/orig -> origin/gh/IvanKobzarev/183/orig
2025-12-04T10:32:19.3742518Z  * [new branch]                gh/IvanKobzarev/184/base -> origin/gh/IvanKobzarev/184/base
2025-12-04T10:32:19.3742715Z  * [new branch]                gh/IvanKobzarev/184/head -> origin/gh/IvanKobzarev/184/head
2025-12-04T10:32:19.3742917Z  * [new branch]                gh/IvanKobzarev/184/orig -> origin/gh/IvanKobzarev/184/orig
2025-12-04T10:32:19.3743120Z  * [new branch]                gh/NikhilAPatel/1/base  -> origin/gh/NikhilAPatel/1/base
2025-12-04T10:32:19.3743326Z  * [new branch]                gh/NikhilAPatel/1/head  -> origin/gh/NikhilAPatel/1/head
2025-12-04T10:32:19.3743527Z  * [new branch]                gh/NikhilAPatel/2/base  -> origin/gh/NikhilAPatel/2/base
2025-12-04T10:32:19.3743726Z  * [new branch]                gh/NikhilAPatel/2/head  -> origin/gh/NikhilAPatel/2/head
2025-12-04T10:32:19.3743952Z  * [new branch]                gh/NikhilAPatel/4/base  -> origin/gh/NikhilAPatel/4/base
2025-12-04T10:32:19.3744154Z  * [new branch]                gh/NikhilAPatel/4/head  -> origin/gh/NikhilAPatel/4/head
2025-12-04T10:32:19.3744358Z  * [new branch]                gh/NikhilAPatel/5/base  -> origin/gh/NikhilAPatel/5/base
2025-12-04T10:32:19.3744553Z  * [new branch]                gh/NikhilAPatel/5/head  -> origin/gh/NikhilAPatel/5/head
2025-12-04T10:32:19.3744751Z  * [new branch]                gh/NikhilAPatel/5/orig  -> origin/gh/NikhilAPatel/5/orig
2025-12-04T10:32:19.3744947Z  * [new branch]                gh/PaliC/17/base        -> origin/gh/PaliC/17/base
2025-12-04T10:32:19.3745135Z  * [new branch]                gh/PaliC/17/head        -> origin/gh/PaliC/17/head
2025-12-04T10:32:19.3745315Z  * [new branch]                gh/PaliC/17/orig        -> origin/gh/PaliC/17/orig
2025-12-04T10:32:19.3745493Z  * [new branch]                gh/PaliC/18/base        -> origin/gh/PaliC/18/base
2025-12-04T10:32:19.3745674Z  * [new branch]                gh/PaliC/18/head        -> origin/gh/PaliC/18/head
2025-12-04T10:32:19.3745851Z  * [new branch]                gh/PaliC/18/orig        -> origin/gh/PaliC/18/orig
2025-12-04T10:32:19.3746027Z  * [new branch]                gh/PaliC/20/base        -> origin/gh/PaliC/20/base
2025-12-04T10:32:19.3746206Z  * [new branch]                gh/PaliC/20/head        -> origin/gh/PaliC/20/head
2025-12-04T10:32:19.3746380Z  * [new branch]                gh/PaliC/20/orig        -> origin/gh/PaliC/20/orig
2025-12-04T10:32:19.3746556Z  * [new branch]                gh/PaliC/21/base        -> origin/gh/PaliC/21/base
2025-12-04T10:32:19.3746729Z  * [new branch]                gh/PaliC/21/head        -> origin/gh/PaliC/21/head
2025-12-04T10:32:19.3746905Z  * [new branch]                gh/PaliC/21/orig        -> origin/gh/PaliC/21/orig
2025-12-04T10:32:19.3747082Z  * [new branch]                gh/PaliC/23/base        -> origin/gh/PaliC/23/base
2025-12-04T10:32:19.3747259Z  * [new branch]                gh/PaliC/23/head        -> origin/gh/PaliC/23/head
2025-12-04T10:32:19.3747435Z  * [new branch]                gh/PaliC/23/orig        -> origin/gh/PaliC/23/orig
2025-12-04T10:32:19.3747606Z  * [new branch]                gh/PaliC/24/base        -> origin/gh/PaliC/24/base
2025-12-04T10:32:19.3747785Z  * [new branch]                gh/PaliC/24/head        -> origin/gh/PaliC/24/head
2025-12-04T10:32:19.3747965Z  * [new branch]                gh/PaliC/24/orig        -> origin/gh/PaliC/24/orig
2025-12-04T10:32:19.3748137Z  * [new branch]                gh/PaliC/25/head        -> origin/gh/PaliC/25/head
2025-12-04T10:32:19.3748343Z  * [new branch]                gh/PaliC/25/next        -> origin/gh/PaliC/25/next
2025-12-04T10:32:19.3748520Z  * [new branch]                gh/PaliC/25/orig        -> origin/gh/PaliC/25/orig
2025-12-04T10:32:19.3748693Z  * [new branch]                gh/PaliC/26/head        -> origin/gh/PaliC/26/head
2025-12-04T10:32:19.3748871Z  * [new branch]                gh/PaliC/26/next        -> origin/gh/PaliC/26/next
2025-12-04T10:32:19.3749048Z  * [new branch]                gh/PaliC/26/orig        -> origin/gh/PaliC/26/orig
2025-12-04T10:32:19.3749221Z  * [new branch]                gh/PaliC/27/next        -> origin/gh/PaliC/27/next
2025-12-04T10:32:19.3749396Z  * [new branch]                gh/PaliC/28/head        -> origin/gh/PaliC/28/head
2025-12-04T10:32:19.3749625Z  * [new branch]                gh/PaliC/28/next        -> origin/gh/PaliC/28/next
2025-12-04T10:32:19.3749798Z  * [new branch]                gh/PaliC/28/orig        -> origin/gh/PaliC/28/orig
2025-12-04T10:32:19.3749984Z  * [new branch]                gh/PaliC/29/head        -> origin/gh/PaliC/29/head
2025-12-04T10:32:19.3750161Z  * [new branch]                gh/PaliC/29/next        -> origin/gh/PaliC/29/next
2025-12-04T10:32:19.3750332Z  * [new branch]                gh/PaliC/29/orig        -> origin/gh/PaliC/29/orig
2025-12-04T10:32:19.3750508Z  * [new branch]                gh/PaliC/30/head        -> origin/gh/PaliC/30/head
2025-12-04T10:32:19.3750717Z  * [new branch]                gh/PaliC/30/next        -> origin/gh/PaliC/30/next
2025-12-04T10:32:19.3750896Z  * [new branch]                gh/PaliC/30/orig        -> origin/gh/PaliC/30/orig
2025-12-04T10:32:19.3751071Z  * [new branch]                gh/PaliC/31/head        -> origin/gh/PaliC/31/head
2025-12-04T10:32:19.3751243Z  * [new branch]                gh/PaliC/31/next        -> origin/gh/PaliC/31/next
2025-12-04T10:32:19.3751421Z  * [new branch]                gh/PaliC/31/orig        -> origin/gh/PaliC/31/orig
2025-12-04T10:32:19.3751612Z  * [new branch]                gh/PaulZhang12/25/base  -> origin/gh/PaulZhang12/25/base
2025-12-04T10:32:19.3751813Z  * [new branch]                gh/PaulZhang12/25/head  -> origin/gh/PaulZhang12/25/head
2025-12-04T10:32:19.3752009Z  * [new branch]                gh/PaulZhang12/25/orig  -> origin/gh/PaulZhang12/25/orig
2025-12-04T10:32:19.3752208Z  * [new branch]                gh/PaulZhang12/28/base  -> origin/gh/PaulZhang12/28/base
2025-12-04T10:32:19.3752405Z  * [new branch]                gh/PaulZhang12/28/head  -> origin/gh/PaulZhang12/28/head
2025-12-04T10:32:19.3752605Z  * [new branch]                gh/PaulZhang12/28/orig  -> origin/gh/PaulZhang12/28/orig
2025-12-04T10:32:19.3752800Z  * [new branch]                gh/PaulZhang12/31/base  -> origin/gh/PaulZhang12/31/base
2025-12-04T10:32:19.3752990Z  * [new branch]                gh/PaulZhang12/31/head  -> origin/gh/PaulZhang12/31/head
2025-12-04T10:32:19.3753186Z  * [new branch]                gh/PaulZhang12/31/orig  -> origin/gh/PaulZhang12/31/orig
2025-12-04T10:32:19.3753380Z  * [new branch]                gh/PaulZhang12/37/base  -> origin/gh/PaulZhang12/37/base
2025-12-04T10:32:19.3753575Z  * [new branch]                gh/PaulZhang12/37/head  -> origin/gh/PaulZhang12/37/head
2025-12-04T10:32:19.3753768Z  * [new branch]                gh/PaulZhang12/37/orig  -> origin/gh/PaulZhang12/37/orig
2025-12-04T10:32:19.3753963Z  * [new branch]                gh/PaulZhang12/40/base  -> origin/gh/PaulZhang12/40/base
2025-12-04T10:32:19.3754157Z  * [new branch]                gh/PaulZhang12/40/head  -> origin/gh/PaulZhang12/40/head
2025-12-04T10:32:19.3754350Z  * [new branch]                gh/PaulZhang12/40/orig  -> origin/gh/PaulZhang12/40/orig
2025-12-04T10:32:19.3754548Z  * [new branch]                gh/PaulZhang12/42/base  -> origin/gh/PaulZhang12/42/base
2025-12-04T10:32:19.3754737Z  * [new branch]                gh/PaulZhang12/42/head  -> origin/gh/PaulZhang12/42/head
2025-12-04T10:32:19.3754930Z  * [new branch]                gh/PaulZhang12/43/base  -> origin/gh/PaulZhang12/43/base
2025-12-04T10:32:19.3755153Z  * [new branch]                gh/PaulZhang12/43/head  -> origin/gh/PaulZhang12/43/head
2025-12-04T10:32:19.3755347Z  * [new branch]                gh/PaulZhang12/43/orig  -> origin/gh/PaulZhang12/43/orig
2025-12-04T10:32:19.3755539Z  * [new branch]                gh/PaulZhang12/44/base  -> origin/gh/PaulZhang12/44/base
2025-12-04T10:32:19.3755730Z  * [new branch]                gh/PaulZhang12/44/head  -> origin/gh/PaulZhang12/44/head
2025-12-04T10:32:19.3755926Z  * [new branch]                gh/PaulZhang12/45/base  -> origin/gh/PaulZhang12/45/base
2025-12-04T10:32:19.3756122Z  * [new branch]                gh/PaulZhang12/45/head  -> origin/gh/PaulZhang12/45/head
2025-12-04T10:32:19.3756314Z  * [new branch]                gh/PaulZhang12/45/orig  -> origin/gh/PaulZhang12/45/orig
2025-12-04T10:32:19.3756569Z  * [new branch]                gh/PaulZhang12/46/base  -> origin/gh/PaulZhang12/46/base
2025-12-04T10:32:19.3756759Z  * [new branch]                gh/PaulZhang12/46/head  -> origin/gh/PaulZhang12/46/head
2025-12-04T10:32:19.3756950Z  * [new branch]                gh/PaulZhang12/46/orig  -> origin/gh/PaulZhang12/46/orig
2025-12-04T10:32:19.3757140Z  * [new branch]                gh/PaulZhang12/47/base  -> origin/gh/PaulZhang12/47/base
2025-12-04T10:32:19.3757331Z  * [new branch]                gh/PaulZhang12/47/head  -> origin/gh/PaulZhang12/47/head
2025-12-04T10:32:19.3757558Z  * [new branch]                gh/PaulZhang12/47/orig  -> origin/gh/PaulZhang12/47/orig
2025-12-04T10:32:19.3757751Z  * [new branch]                gh/PaulZhang12/48/base  -> origin/gh/PaulZhang12/48/base
2025-12-04T10:32:19.3757941Z  * [new branch]                gh/PaulZhang12/48/head  -> origin/gh/PaulZhang12/48/head
2025-12-04T10:32:19.3758127Z  * [new branch]                gh/PaulZhang12/48/orig  -> origin/gh/PaulZhang12/48/orig
2025-12-04T10:32:19.3758322Z  * [new branch]                gh/SamGinzburg/11/base  -> origin/gh/SamGinzburg/11/base
2025-12-04T10:32:19.3758513Z  * [new branch]                gh/SamGinzburg/11/head  -> origin/gh/SamGinzburg/11/head
2025-12-04T10:32:19.3758710Z  * [new branch]                gh/SherlockNoMad/1/base -> origin/gh/SherlockNoMad/1/base
2025-12-04T10:32:19.3758909Z  * [new branch]                gh/SherlockNoMad/1/head -> origin/gh/SherlockNoMad/1/head
2025-12-04T10:32:19.3759107Z  * [new branch]                gh/SherlockNoMad/10/base -> origin/gh/SherlockNoMad/10/base
2025-12-04T10:32:19.3759309Z  * [new branch]                gh/SherlockNoMad/10/head -> origin/gh/SherlockNoMad/10/head
2025-12-04T10:32:19.3759510Z  * [new branch]                gh/SherlockNoMad/10/orig -> origin/gh/SherlockNoMad/10/orig
2025-12-04T10:32:19.3759752Z  * [new branch]                gh/SherlockNoMad/11/base -> origin/gh/SherlockNoMad/11/base
2025-12-04T10:32:19.3759946Z  * [new branch]                gh/SherlockNoMad/11/head -> origin/gh/SherlockNoMad/11/head
2025-12-04T10:32:19.3760179Z  * [new branch]                gh/SherlockNoMad/11/orig -> origin/gh/SherlockNoMad/11/orig
2025-12-04T10:32:19.3760405Z  * [new branch]                gh/SherlockNoMad/12/base -> origin/gh/SherlockNoMad/12/base
2025-12-04T10:32:19.3760760Z  * [new branch]                gh/SherlockNoMad/12/head -> origin/gh/SherlockNoMad/12/head
2025-12-04T10:32:19.3760998Z  * [new branch]                gh/SherlockNoMad/12/orig -> origin/gh/SherlockNoMad/12/orig
2025-12-04T10:32:19.3761225Z  * [new branch]                gh/SherlockNoMad/15/base -> origin/gh/SherlockNoMad/15/base
2025-12-04T10:32:19.3761482Z  * [new branch]                gh/SherlockNoMad/15/head -> origin/gh/SherlockNoMad/15/head
2025-12-04T10:32:19.3761723Z  * [new branch]                gh/SherlockNoMad/15/orig -> origin/gh/SherlockNoMad/15/orig
2025-12-04T10:32:19.3761948Z  * [new branch]                gh/SherlockNoMad/17/base -> origin/gh/SherlockNoMad/17/base
2025-12-04T10:32:19.3762202Z  * [new branch]                gh/SherlockNoMad/17/head -> origin/gh/SherlockNoMad/17/head
2025-12-04T10:32:19.3762445Z  * [new branch]                gh/SherlockNoMad/17/orig -> origin/gh/SherlockNoMad/17/orig
2025-12-04T10:32:19.3762713Z  * [new branch]                gh/SherlockNoMad/18/base -> origin/gh/SherlockNoMad/18/base
2025-12-04T10:32:19.3762959Z  * [new branch]                gh/SherlockNoMad/18/head -> origin/gh/SherlockNoMad/18/head
2025-12-04T10:32:19.3763313Z  * [new branch]                gh/SherlockNoMad/18/orig -> origin/gh/SherlockNoMad/18/orig
2025-12-04T10:32:19.3763546Z  * [new branch]                gh/SherlockNoMad/19/base -> origin/gh/SherlockNoMad/19/base
2025-12-04T10:32:19.3763807Z  * [new branch]                gh/SherlockNoMad/19/head -> origin/gh/SherlockNoMad/19/head
2025-12-04T10:32:19.3764093Z  * [new branch]                gh/SherlockNoMad/19/orig -> origin/gh/SherlockNoMad/19/orig
2025-12-04T10:32:19.3764374Z  * [new branch]                gh/SherlockNoMad/2/base -> origin/gh/SherlockNoMad/2/base
2025-12-04T10:32:19.3764618Z  * [new branch]                gh/SherlockNoMad/2/head -> origin/gh/SherlockNoMad/2/head
2025-12-04T10:32:19.3764982Z  * [new branch]                gh/SherlockNoMad/20/base -> origin/gh/SherlockNoMad/20/base
2025-12-04T10:32:19.3765209Z  * [new branch]                gh/SherlockNoMad/20/head -> origin/gh/SherlockNoMad/20/head
2025-12-04T10:32:19.3765476Z  * [new branch]                gh/SherlockNoMad/20/orig -> origin/gh/SherlockNoMad/20/orig
2025-12-04T10:32:19.3765789Z  * [new branch]                gh/SherlockNoMad/21/base -> origin/gh/SherlockNoMad/21/base
2025-12-04T10:32:19.3766017Z  * [new branch]                gh/SherlockNoMad/21/head -> origin/gh/SherlockNoMad/21/head
2025-12-04T10:32:19.3791010Z  * [new branch]                gh/SherlockNoMad/21/orig -> origin/gh/SherlockNoMad/21/orig
2025-12-04T10:32:19.3791280Z  * [new branch]                gh/SherlockNoMad/3/base -> origin/gh/SherlockNoMad/3/base
2025-12-04T10:32:19.3791497Z  * [new branch]                gh/SherlockNoMad/3/head -> origin/gh/SherlockNoMad/3/head
2025-12-04T10:32:19.3791706Z  * [new branch]                gh/SherlockNoMad/4/base -> origin/gh/SherlockNoMad/4/base
2025-12-04T10:32:19.3791933Z  * [new branch]                gh/SherlockNoMad/4/head -> origin/gh/SherlockNoMad/4/head
2025-12-04T10:32:19.3792132Z  * [new branch]                gh/SherlockNoMad/5/base -> origin/gh/SherlockNoMad/5/base
2025-12-04T10:32:19.3792354Z  * [new branch]                gh/SherlockNoMad/5/head -> origin/gh/SherlockNoMad/5/head
2025-12-04T10:32:19.3792622Z  * [new branch]                gh/Sidharth123-cpu/24/base -> origin/gh/Sidharth123-cpu/24/base
2025-12-04T10:32:19.3792861Z  * [new branch]                gh/Sidharth123-cpu/25/base -> origin/gh/Sidharth123-cpu/25/base
2025-12-04T10:32:19.3793123Z  * [new branch]                gh/Sidharth123-cpu/26/base -> origin/gh/Sidharth123-cpu/26/base
2025-12-04T10:32:19.3793347Z  * [new branch]                gh/Sidharth123-cpu/27/base -> origin/gh/Sidharth123-cpu/27/base
2025-12-04T10:32:19.3793561Z  * [new branch]                gh/StrongerXi/1/base    -> origin/gh/StrongerXi/1/base
2025-12-04T10:32:19.3793765Z  * [new branch]                gh/StrongerXi/1/head    -> origin/gh/StrongerXi/1/head
2025-12-04T10:32:19.3794019Z  * [new branch]                gh/StrongerXi/71/base   -> origin/gh/StrongerXi/71/base
2025-12-04T10:32:19.3794225Z  * [new branch]                gh/StrongerXi/71/head   -> origin/gh/StrongerXi/71/head
2025-12-04T10:32:19.3794437Z  * [new branch]                gh/StrongerXi/72/base   -> origin/gh/StrongerXi/72/base
2025-12-04T10:32:19.3794656Z  * [new branch]                gh/StrongerXi/72/head   -> origin/gh/StrongerXi/72/head
2025-12-04T10:32:19.3794844Z  * [new branch]                gh/StrongerXi/73/base   -> origin/gh/StrongerXi/73/base
2025-12-04T10:32:19.3795049Z  * [new branch]                gh/StrongerXi/73/head   -> origin/gh/StrongerXi/73/head
2025-12-04T10:32:19.3795243Z  * [new branch]                gh/StrongerXi/73/orig   -> origin/gh/StrongerXi/73/orig
2025-12-04T10:32:19.3795429Z  * [new branch]                gh/XilunWu/160/base     -> origin/gh/XilunWu/160/base
2025-12-04T10:32:19.3795713Z  * [new branch]                gh/XilunWu/160/head     -> origin/gh/XilunWu/160/head
2025-12-04T10:32:19.3795900Z  * [new branch]                gh/XilunWu/160/orig     -> origin/gh/XilunWu/160/orig
2025-12-04T10:32:19.3796106Z  * [new branch]                gh/XilunWu/163/base     -> origin/gh/XilunWu/163/base
2025-12-04T10:32:19.3796301Z  * [new branch]                gh/XilunWu/163/head     -> origin/gh/XilunWu/163/head
2025-12-04T10:32:19.3796491Z  * [new branch]                gh/XilunWu/163/orig     -> origin/gh/XilunWu/163/orig
2025-12-04T10:32:19.3796675Z  * [new branch]                gh/XilunWu/168/base     -> origin/gh/XilunWu/168/base
2025-12-04T10:32:19.3796866Z  * [new branch]                gh/XilunWu/168/head     -> origin/gh/XilunWu/168/head
2025-12-04T10:32:19.3797057Z  * [new branch]                gh/XilunWu/168/orig     -> origin/gh/XilunWu/168/orig
2025-12-04T10:32:19.3797246Z  * [new branch]                gh/XilunWu/169/base     -> origin/gh/XilunWu/169/base
2025-12-04T10:32:19.3797441Z  * [new branch]                gh/XilunWu/169/head     -> origin/gh/XilunWu/169/head
2025-12-04T10:32:19.3797632Z  * [new branch]                gh/XilunWu/169/orig     -> origin/gh/XilunWu/169/orig
2025-12-04T10:32:19.3797816Z  * [new branch]                gh/XilunWu/170/base     -> origin/gh/XilunWu/170/base
2025-12-04T10:32:19.3798004Z  * [new branch]                gh/XilunWu/170/head     -> origin/gh/XilunWu/170/head
2025-12-04T10:32:19.3798216Z  * [new branch]                gh/XilunWu/170/orig     -> origin/gh/XilunWu/170/orig
2025-12-04T10:32:19.3798405Z  * [new branch]                gh/XilunWu/171/base     -> origin/gh/XilunWu/171/base
2025-12-04T10:32:19.3798586Z  * [new branch]                gh/XilunWu/171/head     -> origin/gh/XilunWu/171/head
2025-12-04T10:32:19.3798764Z  * [new branch]                gh/XilunWu/171/orig     -> origin/gh/XilunWu/171/orig
2025-12-04T10:32:19.3798941Z  * [new branch]                gh/XilunWu/173/base     -> origin/gh/XilunWu/173/base
2025-12-04T10:32:19.3799118Z  * [new branch]                gh/XilunWu/173/head     -> origin/gh/XilunWu/173/head
2025-12-04T10:32:19.3799297Z  * [new branch]                gh/XilunWu/173/orig     -> origin/gh/XilunWu/173/orig
2025-12-04T10:32:19.3799478Z  * [new branch]                gh/XilunWu/175/base     -> origin/gh/XilunWu/175/base
2025-12-04T10:32:19.3799701Z  * [new branch]                gh/XilunWu/175/head     -> origin/gh/XilunWu/175/head
2025-12-04T10:32:19.3799887Z  * [new branch]                gh/XilunWu/175/orig     -> origin/gh/XilunWu/175/orig
2025-12-04T10:32:19.3800066Z  * [new branch]                gh/XilunWu/176/base     -> origin/gh/XilunWu/176/base
2025-12-04T10:32:19.3800244Z  * [new branch]                gh/XilunWu/176/head     -> origin/gh/XilunWu/176/head
2025-12-04T10:32:19.3800423Z  * [new branch]                gh/XilunWu/176/orig     -> origin/gh/XilunWu/176/orig
2025-12-04T10:32:19.3800609Z  * [new branch]                gh/XuehaiPan/14/base    -> origin/gh/XuehaiPan/14/base
2025-12-04T10:32:19.3800796Z  * [new branch]                gh/XuehaiPan/14/head    -> origin/gh/XuehaiPan/14/head
2025-12-04T10:32:19.3800982Z  * [new branch]                gh/XuehaiPan/14/orig    -> origin/gh/XuehaiPan/14/orig
2025-12-04T10:32:19.3801172Z  * [new branch]                gh/XuehaiPan/179/base   -> origin/gh/XuehaiPan/179/base
2025-12-04T10:32:19.3801363Z  * [new branch]                gh/XuehaiPan/179/head   -> origin/gh/XuehaiPan/179/head
2025-12-04T10:32:19.3801550Z  * [new branch]                gh/XuehaiPan/179/orig   -> origin/gh/XuehaiPan/179/orig
2025-12-04T10:32:19.3801736Z  * [new branch]                gh/XuehaiPan/249/base   -> origin/gh/XuehaiPan/249/base
2025-12-04T10:32:19.3801922Z  * [new branch]                gh/XuehaiPan/249/head   -> origin/gh/XuehaiPan/249/head
2025-12-04T10:32:19.3802108Z  * [new branch]                gh/XuehaiPan/249/orig   -> origin/gh/XuehaiPan/249/orig
2025-12-04T10:32:19.3802297Z  * [new branch]                gh/XuehaiPan/253/base   -> origin/gh/XuehaiPan/253/base
2025-12-04T10:32:19.3802537Z  * [new branch]                gh/XuehaiPan/253/head   -> origin/gh/XuehaiPan/253/head
2025-12-04T10:32:19.3802728Z  * [new branch]                gh/XuehaiPan/253/orig   -> origin/gh/XuehaiPan/253/orig
2025-12-04T10:32:19.3802914Z  * [new branch]                gh/XuehaiPan/254/base   -> origin/gh/XuehaiPan/254/base
2025-12-04T10:32:19.3803102Z  * [new branch]                gh/XuehaiPan/254/head   -> origin/gh/XuehaiPan/254/head
2025-12-04T10:32:19.3803291Z  * [new branch]                gh/XuehaiPan/254/orig   -> origin/gh/XuehaiPan/254/orig
2025-12-04T10:32:19.3803476Z  * [new branch]                gh/XuehaiPan/255/base   -> origin/gh/XuehaiPan/255/base
2025-12-04T10:32:19.3803663Z  * [new branch]                gh/XuehaiPan/255/head   -> origin/gh/XuehaiPan/255/head
2025-12-04T10:32:19.3803849Z  * [new branch]                gh/XuehaiPan/255/orig   -> origin/gh/XuehaiPan/255/orig
2025-12-04T10:32:19.3804038Z  * [new branch]                gh/XuehaiPan/271/base   -> origin/gh/XuehaiPan/271/base
2025-12-04T10:32:19.3804227Z  * [new branch]                gh/XuehaiPan/271/head   -> origin/gh/XuehaiPan/271/head
2025-12-04T10:32:19.3804412Z  * [new branch]                gh/XuehaiPan/271/orig   -> origin/gh/XuehaiPan/271/orig
2025-12-04T10:32:19.3804599Z  * [new branch]                gh/XuehaiPan/343/base   -> origin/gh/XuehaiPan/343/base
2025-12-04T10:32:19.3804818Z  * [new branch]                gh/XuehaiPan/343/head   -> origin/gh/XuehaiPan/343/head
2025-12-04T10:32:19.3805005Z  * [new branch]                gh/XuehaiPan/343/orig   -> origin/gh/XuehaiPan/343/orig
2025-12-04T10:32:19.3805188Z  * [new branch]                gh/XuehaiPan/347/base   -> origin/gh/XuehaiPan/347/base
2025-12-04T10:32:19.3805375Z  * [new branch]                gh/XuehaiPan/347/head   -> origin/gh/XuehaiPan/347/head
2025-12-04T10:32:19.3805561Z  * [new branch]                gh/XuehaiPan/347/orig   -> origin/gh/XuehaiPan/347/orig
2025-12-04T10:32:19.3805746Z  * [new branch]                gh/XuehaiPan/348/base   -> origin/gh/XuehaiPan/348/base
2025-12-04T10:32:19.3805939Z  * [new branch]                gh/XuehaiPan/348/head   -> origin/gh/XuehaiPan/348/head
2025-12-04T10:32:19.3806126Z  * [new branch]                gh/XuehaiPan/348/orig   -> origin/gh/XuehaiPan/348/orig
2025-12-04T10:32:19.3806311Z  * [new branch]                gh/XuehaiPan/350/base   -> origin/gh/XuehaiPan/350/base
2025-12-04T10:32:19.3806506Z  * [new branch]                gh/XuehaiPan/350/head   -> origin/gh/XuehaiPan/350/head
2025-12-04T10:32:19.3806692Z  * [new branch]                gh/XuehaiPan/350/orig   -> origin/gh/XuehaiPan/350/orig
2025-12-04T10:32:19.3806875Z  * [new branch]                gh/XuehaiPan/365/base   -> origin/gh/XuehaiPan/365/base
2025-12-04T10:32:19.3807064Z  * [new branch]                gh/XuehaiPan/365/head   -> origin/gh/XuehaiPan/365/head
2025-12-04T10:32:19.3807252Z  * [new branch]                gh/XuehaiPan/365/orig   -> origin/gh/XuehaiPan/365/orig
2025-12-04T10:32:19.3807444Z  * [new branch]                gh/XuehaiPan/366/base   -> origin/gh/XuehaiPan/366/base
2025-12-04T10:32:19.3807635Z  * [new branch]                gh/XuehaiPan/366/head   -> origin/gh/XuehaiPan/366/head
2025-12-04T10:32:19.3807824Z  * [new branch]                gh/XuehaiPan/370/base   -> origin/gh/XuehaiPan/370/base
2025-12-04T10:32:19.3808015Z  * [new branch]                gh/XuehaiPan/370/head   -> origin/gh/XuehaiPan/370/head
2025-12-04T10:32:19.3808209Z  * [new branch]                gh/XuehaiPan/370/orig   -> origin/gh/XuehaiPan/370/orig
2025-12-04T10:32:19.3808394Z  * [new branch]                gh/XuehaiPan/390/base   -> origin/gh/XuehaiPan/390/base
2025-12-04T10:32:19.3808583Z  * [new branch]                gh/XuehaiPan/390/head   -> origin/gh/XuehaiPan/390/head
2025-12-04T10:32:19.3808770Z  * [new branch]                gh/XuehaiPan/390/orig   -> origin/gh/XuehaiPan/390/orig
2025-12-04T10:32:19.3808952Z  * [new branch]                gh/XuehaiPan/391/base   -> origin/gh/XuehaiPan/391/base
2025-12-04T10:32:19.3809178Z  * [new branch]                gh/XuehaiPan/391/head   -> origin/gh/XuehaiPan/391/head
2025-12-04T10:32:19.3809366Z  * [new branch]                gh/XuehaiPan/391/orig   -> origin/gh/XuehaiPan/391/orig
2025-12-04T10:32:19.3809553Z  * [new branch]                gh/XuehaiPan/392/base   -> origin/gh/XuehaiPan/392/base
2025-12-04T10:32:19.3809788Z  * [new branch]                gh/XuehaiPan/392/head   -> origin/gh/XuehaiPan/392/head
2025-12-04T10:32:19.3809974Z  * [new branch]                gh/XuehaiPan/392/orig   -> origin/gh/XuehaiPan/392/orig
2025-12-04T10:32:19.3810156Z  * [new branch]                gh/XuehaiPan/394/base   -> origin/gh/XuehaiPan/394/base
2025-12-04T10:32:19.3810345Z  * [new branch]                gh/XuehaiPan/394/head   -> origin/gh/XuehaiPan/394/head
2025-12-04T10:32:19.3810530Z  * [new branch]                gh/XuehaiPan/394/orig   -> origin/gh/XuehaiPan/394/orig
2025-12-04T10:32:19.3810714Z  * [new branch]                gh/XuehaiPan/397/base   -> origin/gh/XuehaiPan/397/base
2025-12-04T10:32:19.3810910Z  * [new branch]                gh/XuehaiPan/397/head   -> origin/gh/XuehaiPan/397/head
2025-12-04T10:32:19.3811099Z  * [new branch]                gh/XuehaiPan/397/orig   -> origin/gh/XuehaiPan/397/orig
2025-12-04T10:32:19.3811287Z  * [new branch]                gh/XuehaiPan/398/base   -> origin/gh/XuehaiPan/398/base
2025-12-04T10:32:19.3811532Z  * [new branch]                gh/XuehaiPan/398/head   -> origin/gh/XuehaiPan/398/head
2025-12-04T10:32:19.3811717Z  * [new branch]                gh/XuehaiPan/398/orig   -> origin/gh/XuehaiPan/398/orig
2025-12-04T10:32:19.3811901Z  * [new branch]                gh/XuehaiPan/399/base   -> origin/gh/XuehaiPan/399/base
2025-12-04T10:32:19.3812087Z  * [new branch]                gh/XuehaiPan/399/head   -> origin/gh/XuehaiPan/399/head
2025-12-04T10:32:19.3812270Z  * [new branch]                gh/XuehaiPan/399/orig   -> origin/gh/XuehaiPan/399/orig
2025-12-04T10:32:19.3812460Z  * [new branch]                gh/XuehaiPan/400/base   -> origin/gh/XuehaiPan/400/base
2025-12-04T10:32:19.3812653Z  * [new branch]                gh/XuehaiPan/400/head   -> origin/gh/XuehaiPan/400/head
2025-12-04T10:32:19.3812838Z  * [new branch]                gh/XuehaiPan/400/orig   -> origin/gh/XuehaiPan/400/orig
2025-12-04T10:32:19.3813032Z  * [new branch]                gh/ZhiweiYan-96/39/base -> origin/gh/ZhiweiYan-96/39/base
2025-12-04T10:32:19.3813228Z  * [new branch]                gh/ZhiweiYan-96/39/head -> origin/gh/ZhiweiYan-96/39/head
2025-12-04T10:32:19.3813420Z  * [new branch]                gh/ZhiweiYan-96/39/orig -> origin/gh/ZhiweiYan-96/39/orig
2025-12-04T10:32:19.3813611Z  * [new branch]                gh/ZhiweiYan-96/44/base -> origin/gh/ZhiweiYan-96/44/base
2025-12-04T10:32:19.3813803Z  * [new branch]                gh/ZhiweiYan-96/44/head -> origin/gh/ZhiweiYan-96/44/head
2025-12-04T10:32:19.3813989Z  * [new branch]                gh/ZhiweiYan-96/45/base -> origin/gh/ZhiweiYan-96/45/base
2025-12-04T10:32:19.3814175Z  * [new branch]                gh/ZhiweiYan-96/45/head -> origin/gh/ZhiweiYan-96/45/head
2025-12-04T10:32:19.3814364Z  * [new branch]                gh/ZhiweiYan-96/49/base -> origin/gh/ZhiweiYan-96/49/base
2025-12-04T10:32:19.3814551Z  * [new branch]                gh/ZhiweiYan-96/49/head -> origin/gh/ZhiweiYan-96/49/head
2025-12-04T10:32:19.3814743Z  * [new branch]                gh/ZhiweiYan-96/62/base -> origin/gh/ZhiweiYan-96/62/base
2025-12-04T10:32:19.3814938Z  * [new branch]                gh/ZhiweiYan-96/62/head -> origin/gh/ZhiweiYan-96/62/head
2025-12-04T10:32:19.3815123Z  * [new branch]                gh/ZhiweiYan-96/66/base -> origin/gh/ZhiweiYan-96/66/base
2025-12-04T10:32:19.3815309Z  * [new branch]                gh/ZhiweiYan-96/66/head -> origin/gh/ZhiweiYan-96/66/head
2025-12-04T10:32:19.3815493Z  * [new branch]                gh/ZhiweiYan-96/67/base -> origin/gh/ZhiweiYan-96/67/base
2025-12-04T10:32:19.3815678Z  * [new branch]                gh/ZhiweiYan-96/67/head -> origin/gh/ZhiweiYan-96/67/head
2025-12-04T10:32:19.3815898Z  * [new branch]                gh/ZhiweiYan-96/68/base -> origin/gh/ZhiweiYan-96/68/base
2025-12-04T10:32:19.3816090Z  * [new branch]                gh/ZhiweiYan-96/68/head -> origin/gh/ZhiweiYan-96/68/head
2025-12-04T10:32:19.3816276Z  * [new branch]                gh/ZhiweiYan-96/68/orig -> origin/gh/ZhiweiYan-96/68/orig
2025-12-04T10:32:19.3816464Z  * [new branch]                gh/aakhundov/1/base     -> origin/gh/aakhundov/1/base
2025-12-04T10:32:19.3816647Z  * [new branch]                gh/aakhundov/1/head     -> origin/gh/aakhundov/1/head
2025-12-04T10:32:19.3816832Z  * [new branch]                gh/aakhundov/2/base     -> origin/gh/aakhundov/2/base
2025-12-04T10:32:19.3817015Z  * [new branch]                gh/aakhundov/2/head     -> origin/gh/aakhundov/2/head
2025-12-04T10:32:19.3817198Z  * [new branch]                gh/aditew01/openblas    -> origin/gh/aditew01/openblas
2025-12-04T10:32:19.3817386Z  * [new branch]                gh/aditew01/sbgemm      -> origin/gh/aditew01/sbgemm
2025-12-04T10:32:19.3817573Z  * [new branch]                gh/aditew01/vecbf16     -> origin/gh/aditew01/vecbf16
2025-12-04T10:32:19.3817752Z  * [new branch]                gh/albanD/4/base        -> origin/gh/albanD/4/base
2025-12-04T10:32:19.3817929Z  * [new branch]                gh/albanD/4/head        -> origin/gh/albanD/4/head
2025-12-04T10:32:19.3818103Z  * [new branch]                gh/albanD/4/orig        -> origin/gh/albanD/4/orig
2025-12-04T10:32:19.3818393Z  * [new branch]                gh/alexbrauckmann/paddedtensor_faketensor_init -> origin/gh/alexbrauckmann/paddedtensor_faketensor_init
2025-12-04T10:32:19.3818669Z  * [new branch]                gh/alexsamardzic/12/base -> origin/gh/alexsamardzic/12/base
2025-12-04T10:32:19.3818875Z  * [new branch]                gh/alexsamardzic/12/head -> origin/gh/alexsamardzic/12/head
2025-12-04T10:32:19.3819071Z  * [new branch]                gh/alexsamardzic/12/orig -> origin/gh/alexsamardzic/12/orig
2025-12-04T10:32:19.3819268Z  * [new branch]                gh/alexsamardzic/14/base -> origin/gh/alexsamardzic/14/base
2025-12-04T10:32:19.3819469Z  * [new branch]                gh/alexsamardzic/14/head -> origin/gh/alexsamardzic/14/head
2025-12-04T10:32:19.3819710Z  * [new branch]                gh/alexsamardzic/14/orig -> origin/gh/alexsamardzic/14/orig
2025-12-04T10:32:19.3819911Z  * [new branch]                gh/alexsamardzic/15/base -> origin/gh/alexsamardzic/15/base
2025-12-04T10:32:19.3820115Z  * [new branch]                gh/alexsamardzic/15/head -> origin/gh/alexsamardzic/15/head
2025-12-04T10:32:19.3820311Z  * [new branch]                gh/alexsamardzic/15/orig -> origin/gh/alexsamardzic/15/orig
2025-12-04T10:32:19.3820502Z  * [new branch]                gh/amjames/18/base      -> origin/gh/amjames/18/base
2025-12-04T10:32:19.3820683Z  * [new branch]                gh/amjames/18/head      -> origin/gh/amjames/18/head
2025-12-04T10:32:19.3820860Z  * [new branch]                gh/amjames/18/orig      -> origin/gh/amjames/18/orig
2025-12-04T10:32:19.3821048Z  * [new branch]                gh/andrewor14/35/base   -> origin/gh/andrewor14/35/base
2025-12-04T10:32:19.3821238Z  * [new branch]                gh/andrewor14/35/head   -> origin/gh/andrewor14/35/head
2025-12-04T10:32:19.3821423Z  * [new branch]                gh/andrewor14/35/orig   -> origin/gh/andrewor14/35/orig
2025-12-04T10:32:19.3821610Z  * [new branch]                gh/andrewor14/50/base   -> origin/gh/andrewor14/50/base
2025-12-04T10:32:19.3821792Z  * [new branch]                gh/andrewor14/50/head   -> origin/gh/andrewor14/50/head
2025-12-04T10:32:19.3821981Z  * [new branch]                gh/andrewor14/50/orig   -> origin/gh/andrewor14/50/orig
2025-12-04T10:32:19.3822167Z  * [new branch]                gh/andyanwang/30/base   -> origin/gh/andyanwang/30/base
2025-12-04T10:32:19.3822350Z  * [new branch]                gh/andyanwang/30/orig   -> origin/gh/andyanwang/30/orig
2025-12-04T10:32:19.3822539Z  * [new branch]                gh/andyanwang/31/base   -> origin/gh/andyanwang/31/base
2025-12-04T10:32:19.3822766Z  * [new branch]                gh/andyanwang/31/orig   -> origin/gh/andyanwang/31/orig
2025-12-04T10:32:19.3822951Z  * [new branch]                gh/andyanwang/39/base   -> origin/gh/andyanwang/39/base
2025-12-04T10:32:19.3823140Z  * [new branch]                gh/andyanwang/39/head   -> origin/gh/andyanwang/39/head
2025-12-04T10:32:19.3823327Z  * [new branch]                gh/andyanwang/39/orig   -> origin/gh/andyanwang/39/orig
2025-12-04T10:32:19.3823514Z  * [new branch]                gh/andyanwang/42/base   -> origin/gh/andyanwang/42/base
2025-12-04T10:32:19.3823701Z  * [new branch]                gh/andyanwang/42/head   -> origin/gh/andyanwang/42/head
2025-12-04T10:32:19.3823889Z  * [new branch]                gh/andyanwang/42/orig   -> origin/gh/andyanwang/42/orig
2025-12-04T10:32:19.3824075Z  * [new branch]                gh/andyanwang/45/base   -> origin/gh/andyanwang/45/base
2025-12-04T10:32:19.3824264Z  * [new branch]                gh/andyanwang/45/head   -> origin/gh/andyanwang/45/head
2025-12-04T10:32:19.3824455Z  * [new branch]                gh/andyanwang/45/orig   -> origin/gh/andyanwang/45/orig
2025-12-04T10:32:19.3824638Z  * [new branch]                gh/angelayi/107/base    -> origin/gh/angelayi/107/base
2025-12-04T10:32:19.3824828Z  * [new branch]                gh/angelayi/107/head    -> origin/gh/angelayi/107/head
2025-12-04T10:32:19.3825053Z  * [new branch]                gh/angelayi/114/base    -> origin/gh/angelayi/114/base
2025-12-04T10:32:19.3825241Z  * [new branch]                gh/angelayi/114/head    -> origin/gh/angelayi/114/head
2025-12-04T10:32:19.3825423Z  * [new branch]                gh/angelayi/114/orig    -> origin/gh/angelayi/114/orig
2025-12-04T10:32:19.3825605Z  * [new branch]                gh/angelayi/116/base    -> origin/gh/angelayi/116/base
2025-12-04T10:32:19.3825783Z  * [new branch]                gh/angelayi/116/head    -> origin/gh/angelayi/116/head
2025-12-04T10:32:19.3825964Z  * [new branch]                gh/angelayi/116/orig    -> origin/gh/angelayi/116/orig
2025-12-04T10:32:19.3826150Z  * [new branch]                gh/angelayi/122/base    -> origin/gh/angelayi/122/base
2025-12-04T10:32:19.3826330Z  * [new branch]                gh/angelayi/122/head    -> origin/gh/angelayi/122/head
2025-12-04T10:32:19.3826513Z  * [new branch]                gh/angelayi/122/orig    -> origin/gh/angelayi/122/orig
2025-12-04T10:32:19.3826695Z  * [new branch]                gh/angelayi/124/base    -> origin/gh/angelayi/124/base
2025-12-04T10:32:19.3826874Z  * [new branch]                gh/angelayi/124/head    -> origin/gh/angelayi/124/head
2025-12-04T10:32:19.3827057Z  * [new branch]                gh/angelayi/124/orig    -> origin/gh/angelayi/124/orig
2025-12-04T10:32:19.3827239Z  * [new branch]                gh/angelayi/128/base    -> origin/gh/angelayi/128/base
2025-12-04T10:32:19.3827418Z  * [new branch]                gh/angelayi/128/head    -> origin/gh/angelayi/128/head
2025-12-04T10:32:19.3827600Z  * [new branch]                gh/angelayi/128/orig    -> origin/gh/angelayi/128/orig
2025-12-04T10:32:19.3827784Z  * [new branch]                gh/angelayi/131/base    -> origin/gh/angelayi/131/base
2025-12-04T10:32:19.3827962Z  * [new branch]                gh/angelayi/131/head    -> origin/gh/angelayi/131/head
2025-12-04T10:32:19.3828143Z  * [new branch]                gh/angelayi/131/orig    -> origin/gh/angelayi/131/orig
2025-12-04T10:32:19.3828327Z  * [new branch]                gh/angelayi/132/base    -> origin/gh/angelayi/132/base
2025-12-04T10:32:19.3828505Z  * [new branch]                gh/angelayi/132/head    -> origin/gh/angelayi/132/head
2025-12-04T10:32:19.3828688Z  * [new branch]                gh/angelayi/132/orig    -> origin/gh/angelayi/132/orig
2025-12-04T10:32:19.3828869Z  * [new branch]                gh/angelayi/133/base    -> origin/gh/angelayi/133/base
2025-12-04T10:32:19.3829048Z  * [new branch]                gh/angelayi/133/head    -> origin/gh/angelayi/133/head
2025-12-04T10:32:19.3829229Z  * [new branch]                gh/angelayi/133/orig    -> origin/gh/angelayi/133/orig
2025-12-04T10:32:19.3829439Z  * [new branch]                gh/angelayi/134/base    -> origin/gh/angelayi/134/base
2025-12-04T10:32:19.3829662Z  * [new branch]                gh/angelayi/134/head    -> origin/gh/angelayi/134/head
2025-12-04T10:32:19.3829846Z  * [new branch]                gh/angelayi/134/orig    -> origin/gh/angelayi/134/orig
2025-12-04T10:32:19.3830028Z  * [new branch]                gh/angelayi/135/base    -> origin/gh/angelayi/135/base
2025-12-04T10:32:19.3830210Z  * [new branch]                gh/angelayi/135/head    -> origin/gh/angelayi/135/head
2025-12-04T10:32:19.3830391Z  * [new branch]                gh/angelayi/135/orig    -> origin/gh/angelayi/135/orig
2025-12-04T10:32:19.3830570Z  * [new branch]                gh/angelayi/136/base    -> origin/gh/angelayi/136/base
2025-12-04T10:32:19.3830750Z  * [new branch]                gh/angelayi/136/head    -> origin/gh/angelayi/136/head
2025-12-04T10:32:19.3830931Z  * [new branch]                gh/angelayi/136/orig    -> origin/gh/angelayi/136/orig
2025-12-04T10:32:19.3831113Z  * [new branch]                gh/angelayi/137/base    -> origin/gh/angelayi/137/base
2025-12-04T10:32:19.3831293Z  * [new branch]                gh/angelayi/137/head    -> origin/gh/angelayi/137/head
2025-12-04T10:32:19.3831473Z  * [new branch]                gh/angelayi/137/orig    -> origin/gh/angelayi/137/orig
2025-12-04T10:32:19.3831687Z  * [new branch]                gh/angelayi/138/base    -> origin/gh/angelayi/138/base
2025-12-04T10:32:19.3831872Z  * [new branch]                gh/angelayi/138/head    -> origin/gh/angelayi/138/head
2025-12-04T10:32:19.3832055Z  * [new branch]                gh/angelayi/138/orig    -> origin/gh/angelayi/138/orig
2025-12-04T10:32:19.3832235Z  * [new branch]                gh/angelayi/139/base    -> origin/gh/angelayi/139/base
2025-12-04T10:32:19.3832415Z  * [new branch]                gh/angelayi/139/head    -> origin/gh/angelayi/139/head
2025-12-04T10:32:19.3832595Z  * [new branch]                gh/angelayi/139/orig    -> origin/gh/angelayi/139/orig
2025-12-04T10:32:19.3832777Z  * [new branch]                gh/angelayi/140/base    -> origin/gh/angelayi/140/base
2025-12-04T10:32:19.3832960Z  * [new branch]                gh/angelayi/140/head    -> origin/gh/angelayi/140/head
2025-12-04T10:32:19.3833139Z  * [new branch]                gh/angelayi/140/orig    -> origin/gh/angelayi/140/orig
2025-12-04T10:32:19.3833327Z  * [new branch]                gh/angelayi/141/base    -> origin/gh/angelayi/141/base
2025-12-04T10:32:19.3833506Z  * [new branch]                gh/angelayi/141/head    -> origin/gh/angelayi/141/head
2025-12-04T10:32:19.3833684Z  * [new branch]                gh/angelayi/141/orig    -> origin/gh/angelayi/141/orig
2025-12-04T10:32:19.3833870Z  * [new branch]                gh/angelayi/142/base    -> origin/gh/angelayi/142/base
2025-12-04T10:32:19.3834051Z  * [new branch]                gh/angelayi/142/head    -> origin/gh/angelayi/142/head
2025-12-04T10:32:19.3834231Z  * [new branch]                gh/angelayi/142/orig    -> origin/gh/angelayi/142/orig
2025-12-04T10:32:19.3834413Z  * [new branch]                gh/angelayi/143/base    -> origin/gh/angelayi/143/base
2025-12-04T10:32:19.3834675Z  * [new branch]                gh/angelayi/143/head    -> origin/gh/angelayi/143/head
2025-12-04T10:32:19.3834857Z  * [new branch]                gh/angelayi/143/orig    -> origin/gh/angelayi/143/orig
2025-12-04T10:32:19.3835040Z  * [new branch]                gh/angelayi/144/base    -> origin/gh/angelayi/144/base
2025-12-04T10:32:19.3835222Z  * [new branch]                gh/angelayi/144/head    -> origin/gh/angelayi/144/head
2025-12-04T10:32:19.3835402Z  * [new branch]                gh/angelayi/144/orig    -> origin/gh/angelayi/144/orig
2025-12-04T10:32:19.3835590Z  * [new branch]                gh/anijain2305/753/base -> origin/gh/anijain2305/753/base
2025-12-04T10:32:19.3835780Z  * [new branch]                gh/anijain2305/753/head -> origin/gh/anijain2305/753/head
2025-12-04T10:32:19.3835998Z  * [new branch]                gh/anijain2305/753/orig -> origin/gh/anijain2305/753/orig
2025-12-04T10:32:19.3836190Z  * [new branch]                gh/anijain2305/810/base -> origin/gh/anijain2305/810/base
2025-12-04T10:32:19.3836380Z  * [new branch]                gh/anijain2305/810/head -> origin/gh/anijain2305/810/head
2025-12-04T10:32:19.3836564Z  * [new branch]                gh/anijain2305/810/orig -> origin/gh/anijain2305/810/orig
2025-12-04T10:32:19.3836750Z  * [new branch]                gh/anijain2305/854/base -> origin/gh/anijain2305/854/base
2025-12-04T10:32:19.3836936Z  * [new branch]                gh/anijain2305/854/head -> origin/gh/anijain2305/854/head
2025-12-04T10:32:19.3837122Z  * [new branch]                gh/anijain2305/854/orig -> origin/gh/anijain2305/854/orig
2025-12-04T10:32:19.3837311Z  * [new branch]                gh/anijain2305/864/base -> origin/gh/anijain2305/864/base
2025-12-04T10:32:19.3837496Z  * [new branch]                gh/anijain2305/864/head -> origin/gh/anijain2305/864/head
2025-12-04T10:32:19.3837683Z  * [new branch]                gh/anijain2305/864/orig -> origin/gh/anijain2305/864/orig
2025-12-04T10:32:19.3837875Z  * [new branch]                gh/anijain2305/870/base -> origin/gh/anijain2305/870/base
2025-12-04T10:32:19.3838061Z  * [new branch]                gh/anijain2305/870/head -> origin/gh/anijain2305/870/head
2025-12-04T10:32:19.3838285Z  * [new branch]                gh/anijain2305/870/orig -> origin/gh/anijain2305/870/orig
2025-12-04T10:32:19.3838472Z  * [new branch]                gh/anijain2305/873/base -> origin/gh/anijain2305/873/base
2025-12-04T10:32:19.3838655Z  * [new branch]                gh/anijain2305/873/head -> origin/gh/anijain2305/873/head
2025-12-04T10:32:19.3838842Z  * [new branch]                gh/anijain2305/873/orig -> origin/gh/anijain2305/873/orig
2025-12-04T10:32:19.3839028Z  * [new branch]                gh/anijain2305/894/base -> origin/gh/anijain2305/894/base
2025-12-04T10:32:19.3839213Z  * [new branch]                gh/anijain2305/894/head -> origin/gh/anijain2305/894/head
2025-12-04T10:32:19.3839400Z  * [new branch]                gh/anijain2305/894/orig -> origin/gh/anijain2305/894/orig
2025-12-04T10:32:19.3839625Z  * [new branch]                gh/anijain2305/895/base -> origin/gh/anijain2305/895/base
2025-12-04T10:32:19.3839813Z  * [new branch]                gh/anijain2305/895/head -> origin/gh/anijain2305/895/head
2025-12-04T10:32:19.3840003Z  * [new branch]                gh/anijain2305/895/orig -> origin/gh/anijain2305/895/orig
2025-12-04T10:32:19.3840193Z  * [new branch]                gh/anijain2305/910/base -> origin/gh/anijain2305/910/base
2025-12-04T10:32:19.3840377Z  * [new branch]                gh/anijain2305/910/head -> origin/gh/anijain2305/910/head
2025-12-04T10:32:19.3840565Z  * [new branch]                gh/anijain2305/910/orig -> origin/gh/anijain2305/910/orig
2025-12-04T10:32:19.3840751Z  * [new branch]                gh/anijain2305/919/base -> origin/gh/anijain2305/919/base
2025-12-04T10:32:19.3840937Z  * [new branch]                gh/anijain2305/919/head -> origin/gh/anijain2305/919/head
2025-12-04T10:32:19.3841124Z  * [new branch]                gh/anijain2305/919/orig -> origin/gh/anijain2305/919/orig
2025-12-04T10:32:19.3841310Z  * [new branch]                gh/anijain2305/922/base -> origin/gh/anijain2305/922/base
2025-12-04T10:32:19.3841492Z  * [new branch]                gh/anijain2305/922/head -> origin/gh/anijain2305/922/head
2025-12-04T10:32:19.3841680Z  * [new branch]                gh/anijain2305/922/orig -> origin/gh/anijain2305/922/orig
2025-12-04T10:32:19.3841866Z  * [new branch]                gh/anijain2305/932/base -> origin/gh/anijain2305/932/base
2025-12-04T10:32:19.3842052Z  * [new branch]                gh/anijain2305/932/head -> origin/gh/anijain2305/932/head
2025-12-04T10:32:19.3842237Z  * [new branch]                gh/anijain2305/932/orig -> origin/gh/anijain2305/932/orig
2025-12-04T10:32:19.3842420Z  * [new branch]                gh/anijain2305/940/base -> origin/gh/anijain2305/940/base
2025-12-04T10:32:19.3842645Z  * [new branch]                gh/anijain2305/940/head -> origin/gh/anijain2305/940/head
2025-12-04T10:32:19.3842832Z  * [new branch]                gh/anijain2305/940/orig -> origin/gh/anijain2305/940/orig
2025-12-04T10:32:19.3843016Z  * [new branch]                gh/anijain2305/941/base -> origin/gh/anijain2305/941/base
2025-12-04T10:32:19.3843204Z  * [new branch]                gh/anijain2305/941/head -> origin/gh/anijain2305/941/head
2025-12-04T10:32:19.3843395Z  * [new branch]                gh/anijain2305/941/orig -> origin/gh/anijain2305/941/orig
2025-12-04T10:32:19.3843579Z  * [new branch]                gh/anijain2305/942/base -> origin/gh/anijain2305/942/base
2025-12-04T10:32:19.3843766Z  * [new branch]                gh/anijain2305/942/head -> origin/gh/anijain2305/942/head
2025-12-04T10:32:19.3843951Z  * [new branch]                gh/anijain2305/942/orig -> origin/gh/anijain2305/942/orig
2025-12-04T10:32:19.3844137Z  * [new branch]                gh/anijain2305/943/base -> origin/gh/anijain2305/943/base
2025-12-04T10:32:19.3844329Z  * [new branch]                gh/anijain2305/943/head -> origin/gh/anijain2305/943/head
2025-12-04T10:32:19.3844517Z  * [new branch]                gh/anijain2305/943/orig -> origin/gh/anijain2305/943/orig
2025-12-04T10:32:19.3844700Z  * [new branch]                gh/anijain2305/944/base -> origin/gh/anijain2305/944/base
2025-12-04T10:32:19.3844930Z  * [new branch]                gh/anijain2305/944/head -> origin/gh/anijain2305/944/head
2025-12-04T10:32:19.3845115Z  * [new branch]                gh/anijain2305/944/orig -> origin/gh/anijain2305/944/orig
2025-12-04T10:32:19.3845301Z  * [new branch]                gh/anijain2305/945/base -> origin/gh/anijain2305/945/base
2025-12-04T10:32:19.3845487Z  * [new branch]                gh/anijain2305/945/head -> origin/gh/anijain2305/945/head
2025-12-04T10:32:19.3845672Z  * [new branch]                gh/anijain2305/945/orig -> origin/gh/anijain2305/945/orig
2025-12-04T10:32:19.3845861Z  * [new branch]                gh/anijain2305/946/base -> origin/gh/anijain2305/946/base
2025-12-04T10:32:19.3846046Z  * [new branch]                gh/anijain2305/946/head -> origin/gh/anijain2305/946/head
2025-12-04T10:32:19.3846232Z  * [new branch]                gh/anijain2305/946/orig -> origin/gh/anijain2305/946/orig
2025-12-04T10:32:19.3846416Z  * [new branch]                gh/anijain2305/947/base -> origin/gh/anijain2305/947/base
2025-12-04T10:32:19.3846605Z  * [new branch]                gh/anijain2305/947/head -> origin/gh/anijain2305/947/head
2025-12-04T10:32:19.3846791Z  * [new branch]                gh/anijain2305/947/orig -> origin/gh/anijain2305/947/orig
2025-12-04T10:32:19.3846974Z  * [new branch]                gh/anijain2305/948/base -> origin/gh/anijain2305/948/base
2025-12-04T10:32:19.3847158Z  * [new branch]                gh/anijain2305/948/head -> origin/gh/anijain2305/948/head
2025-12-04T10:32:19.3847343Z  * [new branch]                gh/anijain2305/948/orig -> origin/gh/anijain2305/948/orig
2025-12-04T10:32:19.3847532Z  * [new branch]                gh/anijain2305/949/base -> origin/gh/anijain2305/949/base
2025-12-04T10:32:19.3847717Z  * [new branch]                gh/anijain2305/949/head -> origin/gh/anijain2305/949/head
2025-12-04T10:32:19.3847901Z  * [new branch]                gh/anijain2305/949/orig -> origin/gh/anijain2305/949/orig
2025-12-04T10:32:19.3848090Z  * [new branch]                gh/anijain2305/950/base -> origin/gh/anijain2305/950/base
2025-12-04T10:32:19.3848280Z  * [new branch]                gh/anijain2305/950/head -> origin/gh/anijain2305/950/head
2025-12-04T10:32:19.3848462Z  * [new branch]                gh/anijain2305/950/orig -> origin/gh/anijain2305/950/orig
2025-12-04T10:32:19.3848647Z  * [new branch]                gh/anijain2305/951/base -> origin/gh/anijain2305/951/base
2025-12-04T10:32:19.3848833Z  * [new branch]                gh/anijain2305/951/head -> origin/gh/anijain2305/951/head
2025-12-04T10:32:19.3849018Z  * [new branch]                gh/anijain2305/951/orig -> origin/gh/anijain2305/951/orig
2025-12-04T10:32:19.3849246Z  * [new branch]                gh/anijain2305/952/base -> origin/gh/anijain2305/952/base
2025-12-04T10:32:19.3849432Z  * [new branch]                gh/anijain2305/952/head -> origin/gh/anijain2305/952/head
2025-12-04T10:32:19.3849651Z  * [new branch]                gh/anijain2305/952/orig -> origin/gh/anijain2305/952/orig
2025-12-04T10:32:19.3849840Z  * [new branch]                gh/anijain2305/953/base -> origin/gh/anijain2305/953/base
2025-12-04T10:32:19.3850027Z  * [new branch]                gh/anijain2305/953/head -> origin/gh/anijain2305/953/head
2025-12-04T10:32:19.3850211Z  * [new branch]                gh/anijain2305/953/orig -> origin/gh/anijain2305/953/orig
2025-12-04T10:32:19.3850397Z  * [new branch]                gh/anijain2305/954/base -> origin/gh/anijain2305/954/base
2025-12-04T10:32:19.3850584Z  * [new branch]                gh/anijain2305/954/head -> origin/gh/anijain2305/954/head
2025-12-04T10:32:19.3850775Z  * [new branch]                gh/anijain2305/954/orig -> origin/gh/anijain2305/954/orig
2025-12-04T10:32:19.3850961Z  * [new branch]                gh/anijain2305/955/base -> origin/gh/anijain2305/955/base
2025-12-04T10:32:19.3851147Z  * [new branch]                gh/anijain2305/955/head -> origin/gh/anijain2305/955/head
2025-12-04T10:32:19.3851336Z  * [new branch]                gh/anijain2305/955/orig -> origin/gh/anijain2305/955/orig
2025-12-04T10:32:19.3851572Z  * [new branch]                gh/anijain2305/956/base -> origin/gh/anijain2305/956/base
2025-12-04T10:32:19.3851757Z  * [new branch]                gh/anijain2305/956/head -> origin/gh/anijain2305/956/head
2025-12-04T10:32:19.3851943Z  * [new branch]                gh/anijain2305/956/orig -> origin/gh/anijain2305/956/orig
2025-12-04T10:32:19.3852128Z  * [new branch]                gh/anijain2305/957/base -> origin/gh/anijain2305/957/base
2025-12-04T10:32:19.3852313Z  * [new branch]                gh/anijain2305/957/head -> origin/gh/anijain2305/957/head
2025-12-04T10:32:19.3852504Z  * [new branch]                gh/anijain2305/957/orig -> origin/gh/anijain2305/957/orig
2025-12-04T10:32:19.3852693Z  * [new branch]                gh/anijain2305/958/base -> origin/gh/anijain2305/958/base
2025-12-04T10:32:19.3852878Z  * [new branch]                gh/anijain2305/958/head -> origin/gh/anijain2305/958/head
2025-12-04T10:32:19.3853066Z  * [new branch]                gh/anijain2305/958/orig -> origin/gh/anijain2305/958/orig
2025-12-04T10:32:19.3853252Z  * [new branch]                gh/anijain2305/959/base -> origin/gh/anijain2305/959/base
2025-12-04T10:32:19.3853439Z  * [new branch]                gh/anijain2305/959/head -> origin/gh/anijain2305/959/head
2025-12-04T10:32:19.3853629Z  * [new branch]                gh/anijain2305/959/orig -> origin/gh/anijain2305/959/orig
2025-12-04T10:32:19.3853815Z  * [new branch]                gh/anijain2305/960/base -> origin/gh/anijain2305/960/base
2025-12-04T10:32:19.3853998Z  * [new branch]                gh/anijain2305/960/head -> origin/gh/anijain2305/960/head
2025-12-04T10:32:19.3854185Z  * [new branch]                gh/anijain2305/960/orig -> origin/gh/anijain2305/960/orig
2025-12-04T10:32:19.3854369Z  * [new branch]                gh/anijain2305/961/base -> origin/gh/anijain2305/961/base
2025-12-04T10:32:19.3854555Z  * [new branch]                gh/anijain2305/961/head -> origin/gh/anijain2305/961/head
2025-12-04T10:32:19.3854741Z  * [new branch]                gh/anijain2305/961/orig -> origin/gh/anijain2305/961/orig
2025-12-04T10:32:19.3854930Z  * [new branch]                gh/anijain2305/962/base -> origin/gh/anijain2305/962/base
2025-12-04T10:32:19.3855115Z  * [new branch]                gh/anijain2305/962/head -> origin/gh/anijain2305/962/head
2025-12-04T10:32:19.3855303Z  * [new branch]                gh/anijain2305/962/orig -> origin/gh/anijain2305/962/orig
2025-12-04T10:32:19.3855492Z  * [new branch]                gh/anijain2305/963/base -> origin/gh/anijain2305/963/base
2025-12-04T10:32:19.3855732Z  * [new branch]                gh/anijain2305/963/head -> origin/gh/anijain2305/963/head
2025-12-04T10:32:19.3855919Z  * [new branch]                gh/anijain2305/963/orig -> origin/gh/anijain2305/963/orig
2025-12-04T10:32:19.3856103Z  * [new branch]                gh/anijain2305/964/base -> origin/gh/anijain2305/964/base
2025-12-04T10:32:19.3856286Z  * [new branch]                gh/anijain2305/964/head -> origin/gh/anijain2305/964/head
2025-12-04T10:32:19.3856474Z  * [new branch]                gh/anijain2305/964/orig -> origin/gh/anijain2305/964/orig
2025-12-04T10:32:19.3856661Z  * [new branch]                gh/anijain2305/965/base -> origin/gh/anijain2305/965/base
2025-12-04T10:32:19.3856847Z  * [new branch]                gh/anijain2305/965/head -> origin/gh/anijain2305/965/head
2025-12-04T10:32:19.3857032Z  * [new branch]                gh/anijain2305/965/orig -> origin/gh/anijain2305/965/orig
2025-12-04T10:32:19.3857216Z  * [new branch]                gh/anijain2305/966/base -> origin/gh/anijain2305/966/base
2025-12-04T10:32:19.3857407Z  * [new branch]                gh/anijain2305/966/head -> origin/gh/anijain2305/966/head
2025-12-04T10:32:19.3857594Z  * [new branch]                gh/anijain2305/966/orig -> origin/gh/anijain2305/966/orig
2025-12-04T10:32:19.3857778Z  * [new branch]                gh/anijain2305/967/base -> origin/gh/anijain2305/967/base
2025-12-04T10:32:19.3858006Z  * [new branch]                gh/anijain2305/967/head -> origin/gh/anijain2305/967/head
2025-12-04T10:32:19.3858191Z  * [new branch]                gh/anijain2305/967/orig -> origin/gh/anijain2305/967/orig
2025-12-04T10:32:19.3858375Z  * [new branch]                gh/anijain2305/968/base -> origin/gh/anijain2305/968/base
2025-12-04T10:32:19.3858562Z  * [new branch]                gh/anijain2305/968/head -> origin/gh/anijain2305/968/head
2025-12-04T10:32:19.3858748Z  * [new branch]                gh/anijain2305/968/orig -> origin/gh/anijain2305/968/orig
2025-12-04T10:32:19.3858935Z  * [new branch]                gh/anijain2305/969/base -> origin/gh/anijain2305/969/base
2025-12-04T10:32:19.3859124Z  * [new branch]                gh/anijain2305/969/head -> origin/gh/anijain2305/969/head
2025-12-04T10:32:19.3859311Z  * [new branch]                gh/anijain2305/969/orig -> origin/gh/anijain2305/969/orig
2025-12-04T10:32:19.3859497Z  * [new branch]                gh/anijain2305/970/base -> origin/gh/anijain2305/970/base
2025-12-04T10:32:19.3859739Z  * [new branch]                gh/anijain2305/970/head -> origin/gh/anijain2305/970/head
2025-12-04T10:32:19.3859924Z  * [new branch]                gh/anijain2305/970/orig -> origin/gh/anijain2305/970/orig
2025-12-04T10:32:19.3860109Z  * [new branch]                gh/anjali411/216/base   -> origin/gh/anjali411/216/base
2025-12-04T10:32:19.3860294Z  * [new branch]                gh/anjali411/216/head   -> origin/gh/anjali411/216/head
2025-12-04T10:32:19.3860474Z  * [new branch]                gh/anjali411/216/orig   -> origin/gh/anjali411/216/orig
2025-12-04T10:32:19.3860659Z  * [new branch]                gh/anshul-si/1/base     -> origin/gh/anshul-si/1/base
2025-12-04T10:32:19.3860841Z  * [new branch]                gh/anshul-si/1/head     -> origin/gh/anshul-si/1/head
2025-12-04T10:32:19.3861017Z  * [new branch]                gh/anshul-si/2/base     -> origin/gh/anshul-si/2/base
2025-12-04T10:32:19.3861194Z  * [new branch]                gh/anshul-si/2/head     -> origin/gh/anshul-si/2/head
2025-12-04T10:32:19.3861375Z  * [new branch]                gh/anshul-si/3/base     -> origin/gh/anshul-si/3/base
2025-12-04T10:32:19.3861549Z  * [new branch]                gh/anshul-si/3/head     -> origin/gh/anshul-si/3/head
2025-12-04T10:32:19.3861728Z  * [new branch]                gh/anshul-si/4/base     -> origin/gh/anshul-si/4/base
2025-12-04T10:32:19.3861907Z  * [new branch]                gh/anshul-si/4/head     -> origin/gh/anshul-si/4/head
2025-12-04T10:32:19.3862081Z  * [new branch]                gh/anshul-si/5/base     -> origin/gh/anshul-si/5/base
2025-12-04T10:32:19.3862257Z  * [new branch]                gh/anshul-si/5/head     -> origin/gh/anshul-si/5/head
2025-12-04T10:32:19.3862505Z  * [new branch]                gh/anshul-si/53/base    -> origin/gh/anshul-si/53/base
2025-12-04T10:32:19.3862686Z  * [new branch]                gh/anshul-si/53/head    -> origin/gh/anshul-si/53/head
2025-12-04T10:32:19.3862868Z  * [new branch]                gh/anshul-si/58/base    -> origin/gh/anshul-si/58/base
2025-12-04T10:32:19.3863051Z  * [new branch]                gh/anshul-si/58/head    -> origin/gh/anshul-si/58/head
2025-12-04T10:32:19.3863229Z  * [new branch]                gh/anshul-si/66/base    -> origin/gh/anshul-si/66/base
2025-12-04T10:32:19.3863408Z  * [new branch]                gh/anshul-si/66/head    -> origin/gh/anshul-si/66/head
2025-12-04T10:32:19.3863587Z  * [new branch]                gh/anshul-si/66/orig    -> origin/gh/anshul-si/66/orig
2025-12-04T10:32:19.3863763Z  * [new branch]                gh/anshul-si/67/base    -> origin/gh/anshul-si/67/base
2025-12-04T10:32:19.3863949Z  * [new branch]                gh/anshul-si/67/head    -> origin/gh/anshul-si/67/head
2025-12-04T10:32:19.3864134Z  * [new branch]                gh/anshul-si/67/orig    -> origin/gh/anshul-si/67/orig
2025-12-04T10:32:19.3864310Z  * [new branch]                gh/anshul-si/68/base    -> origin/gh/anshul-si/68/base
2025-12-04T10:32:19.3864491Z  * [new branch]                gh/anshul-si/68/head    -> origin/gh/anshul-si/68/head
2025-12-04T10:32:19.3864718Z  * [new branch]                gh/anshul-si/68/orig    -> origin/gh/anshul-si/68/orig
2025-12-04T10:32:19.3864897Z  * [new branch]                gh/anshul-si/69/base    -> origin/gh/anshul-si/69/base
2025-12-04T10:32:19.3865078Z  * [new branch]                gh/anshul-si/69/head    -> origin/gh/anshul-si/69/head
2025-12-04T10:32:19.3865254Z  * [new branch]                gh/anshul-si/69/orig    -> origin/gh/anshul-si/69/orig
2025-12-04T10:32:19.3865433Z  * [new branch]                gh/anshul-si/70/base    -> origin/gh/anshul-si/70/base
2025-12-04T10:32:19.3865616Z  * [new branch]                gh/anshul-si/70/head    -> origin/gh/anshul-si/70/head
2025-12-04T10:32:19.3865794Z  * [new branch]                gh/anshul-si/70/orig    -> origin/gh/anshul-si/70/orig
2025-12-04T10:32:19.3865975Z  * [new branch]                gh/anshul-si/71/base    -> origin/gh/anshul-si/71/base
2025-12-04T10:32:19.3866156Z  * [new branch]                gh/anshul-si/71/head    -> origin/gh/anshul-si/71/head
2025-12-04T10:32:19.3866337Z  * [new branch]                gh/anshul-si/71/orig    -> origin/gh/anshul-si/71/orig
2025-12-04T10:32:19.3866518Z  * [new branch]                gh/anshul-si/72/base    -> origin/gh/anshul-si/72/base
2025-12-04T10:32:19.3866700Z  * [new branch]                gh/anshul-si/72/head    -> origin/gh/anshul-si/72/head
2025-12-04T10:32:19.3866878Z  * [new branch]                gh/anshul-si/72/orig    -> origin/gh/anshul-si/72/orig
2025-12-04T10:32:19.3867061Z  * [new branch]                gh/anshul-si/73/base    -> origin/gh/anshul-si/73/base
2025-12-04T10:32:19.3867242Z  * [new branch]                gh/anshul-si/73/head    -> origin/gh/anshul-si/73/head
2025-12-04T10:32:19.3867422Z  * [new branch]                gh/anshul-si/73/orig    -> origin/gh/anshul-si/73/orig
2025-12-04T10:32:19.3867605Z  * [new branch]                gh/aorenste/132/base    -> origin/gh/aorenste/132/base
2025-12-04T10:32:19.3867789Z  * [new branch]                gh/aorenste/132/head    -> origin/gh/aorenste/132/head
2025-12-04T10:32:19.3867974Z  * [new branch]                gh/aorenste/134/base    -> origin/gh/aorenste/134/base
2025-12-04T10:32:19.3868159Z  * [new branch]                gh/aorenste/134/head    -> origin/gh/aorenste/134/head
2025-12-04T10:32:19.3868340Z  * [new branch]                gh/aorenste/134/orig    -> origin/gh/aorenste/134/orig
2025-12-04T10:32:19.3868520Z  * [new branch]                gh/aorenste/139/base    -> origin/gh/aorenste/139/base
2025-12-04T10:32:19.3868703Z  * [new branch]                gh/aorenste/139/head    -> origin/gh/aorenste/139/head
2025-12-04T10:32:19.3868924Z  * [new branch]                gh/aorenste/139/orig    -> origin/gh/aorenste/139/orig
2025-12-04T10:32:19.3869106Z  * [new branch]                gh/aorenste/141/base    -> origin/gh/aorenste/141/base
2025-12-04T10:32:19.3869289Z  * [new branch]                gh/aorenste/141/head    -> origin/gh/aorenste/141/head
2025-12-04T10:32:19.3869469Z  * [new branch]                gh/aorenste/145/base    -> origin/gh/aorenste/145/base
2025-12-04T10:32:19.3869701Z  * [new branch]                gh/aorenste/145/head    -> origin/gh/aorenste/145/head
2025-12-04T10:32:19.3869886Z  * [new branch]                gh/aorenste/145/orig    -> origin/gh/aorenste/145/orig
2025-12-04T10:32:19.3870068Z  * [new branch]                gh/aorenste/146/base    -> origin/gh/aorenste/146/base
2025-12-04T10:32:19.3870251Z  * [new branch]                gh/aorenste/146/head    -> origin/gh/aorenste/146/head
2025-12-04T10:32:19.3870434Z  * [new branch]                gh/aorenste/146/orig    -> origin/gh/aorenste/146/orig
2025-12-04T10:32:19.3870620Z  * [new branch]                gh/aorenste/147/base    -> origin/gh/aorenste/147/base
2025-12-04T10:32:19.3870806Z  * [new branch]                gh/aorenste/147/head    -> origin/gh/aorenste/147/head
2025-12-04T10:32:19.3870989Z  * [new branch]                gh/aorenste/147/orig    -> origin/gh/aorenste/147/orig
2025-12-04T10:32:19.3871219Z  * [new branch]                gh/aorenste/148/base    -> origin/gh/aorenste/148/base
2025-12-04T10:32:19.3871401Z  * [new branch]                gh/aorenste/148/head    -> origin/gh/aorenste/148/head
2025-12-04T10:32:19.3871585Z  * [new branch]                gh/aorenste/148/orig    -> origin/gh/aorenste/148/orig
2025-12-04T10:32:19.3871765Z  * [new branch]                gh/aorenste/149/base    -> origin/gh/aorenste/149/base
2025-12-04T10:32:19.3871950Z  * [new branch]                gh/aorenste/149/head    -> origin/gh/aorenste/149/head
2025-12-04T10:32:19.3872134Z  * [new branch]                gh/aorenste/149/orig    -> origin/gh/aorenste/149/orig
2025-12-04T10:32:19.3872317Z  * [new branch]                gh/aorenste/150/base    -> origin/gh/aorenste/150/base
2025-12-04T10:32:19.3872502Z  * [new branch]                gh/aorenste/150/head    -> origin/gh/aorenste/150/head
2025-12-04T10:32:19.3872683Z  * [new branch]                gh/aorenste/150/orig    -> origin/gh/aorenste/150/orig
2025-12-04T10:32:19.3872868Z  * [new branch]                gh/aorenste/151/base    -> origin/gh/aorenste/151/base
2025-12-04T10:32:19.3873051Z  * [new branch]                gh/aorenste/151/head    -> origin/gh/aorenste/151/head
2025-12-04T10:32:19.3873233Z  * [new branch]                gh/aorenste/151/orig    -> origin/gh/aorenste/151/orig
2025-12-04T10:32:19.3873419Z  * [new branch]                gh/aorenste/152/base    -> origin/gh/aorenste/152/base
2025-12-04T10:32:19.3873601Z  * [new branch]                gh/aorenste/152/head    -> origin/gh/aorenste/152/head
2025-12-04T10:32:19.3873782Z  * [new branch]                gh/aorenste/152/orig    -> origin/gh/aorenste/152/orig
2025-12-04T10:32:19.3873967Z  * [new branch]                gh/aorenste/153/base    -> origin/gh/aorenste/153/base
2025-12-04T10:32:19.3874150Z  * [new branch]                gh/aorenste/153/head    -> origin/gh/aorenste/153/head
2025-12-04T10:32:19.3874334Z  * [new branch]                gh/aorenste/153/orig    -> origin/gh/aorenste/153/orig
2025-12-04T10:32:19.3874517Z  * [new branch]                gh/aorenste/154/base    -> origin/gh/aorenste/154/base
2025-12-04T10:32:19.3874699Z  * [new branch]                gh/aorenste/154/head    -> origin/gh/aorenste/154/head
2025-12-04T10:32:19.3874878Z  * [new branch]                gh/aorenste/154/orig    -> origin/gh/aorenste/154/orig
2025-12-04T10:32:19.3875060Z  * [new branch]                gh/aorenste/155/base    -> origin/gh/aorenste/155/base
2025-12-04T10:32:19.3875244Z  * [new branch]                gh/aorenste/155/head    -> origin/gh/aorenste/155/head
2025-12-04T10:32:19.3875425Z  * [new branch]                gh/aorenste/155/orig    -> origin/gh/aorenste/155/orig
2025-12-04T10:32:19.3875659Z  * [new branch]                gh/aorenste/156/base    -> origin/gh/aorenste/156/base
2025-12-04T10:32:19.3875843Z  * [new branch]                gh/aorenste/156/head    -> origin/gh/aorenste/156/head
2025-12-04T10:32:19.3876023Z  * [new branch]                gh/aorenste/156/orig    -> origin/gh/aorenste/156/orig
2025-12-04T10:32:19.3876208Z  * [new branch]                gh/aorenste/157/base    -> origin/gh/aorenste/157/base
2025-12-04T10:32:19.3876394Z  * [new branch]                gh/aorenste/157/head    -> origin/gh/aorenste/157/head
2025-12-04T10:32:19.3876573Z  * [new branch]                gh/aorenste/157/orig    -> origin/gh/aorenste/157/orig
2025-12-04T10:32:19.3876755Z  * [new branch]                gh/aorenste/158/base    -> origin/gh/aorenste/158/base
2025-12-04T10:32:19.3876933Z  * [new branch]                gh/aorenste/158/head    -> origin/gh/aorenste/158/head
2025-12-04T10:32:19.3877117Z  * [new branch]                gh/aorenste/158/orig    -> origin/gh/aorenste/158/orig
2025-12-04T10:32:19.3877305Z  * [new branch]                gh/aorenste/159/base    -> origin/gh/aorenste/159/base
2025-12-04T10:32:19.3877483Z  * [new branch]                gh/aorenste/159/head    -> origin/gh/aorenste/159/head
2025-12-04T10:32:19.3877663Z  * [new branch]                gh/aorenste/159/orig    -> origin/gh/aorenste/159/orig
2025-12-04T10:32:19.3877895Z  * [new branch]                gh/avikchaudhuri/1/base -> origin/gh/avikchaudhuri/1/base
2025-12-04T10:32:19.3878091Z  * [new branch]                gh/avikchaudhuri/1/head -> origin/gh/avikchaudhuri/1/head
2025-12-04T10:32:19.3878288Z  * [new branch]                gh/avikchaudhuri/2/base -> origin/gh/avikchaudhuri/2/base
2025-12-04T10:32:19.3878487Z  * [new branch]                gh/avikchaudhuri/2/head -> origin/gh/avikchaudhuri/2/head
2025-12-04T10:32:19.3878680Z  * [new branch]                gh/avikchaudhuri/2/orig -> origin/gh/avikchaudhuri/2/orig
2025-12-04T10:32:19.3878873Z  * [new branch]                gh/bdhirsh/666/base     -> origin/gh/bdhirsh/666/base
2025-12-04T10:32:19.3879057Z  * [new branch]                gh/bdhirsh/666/head     -> origin/gh/bdhirsh/666/head
2025-12-04T10:32:19.3879237Z  * [new branch]                gh/bdhirsh/666/orig     -> origin/gh/bdhirsh/666/orig
2025-12-04T10:32:19.3879424Z  * [new branch]                gh/bdhirsh/668/base     -> origin/gh/bdhirsh/668/base
2025-12-04T10:32:19.3879660Z  * [new branch]                gh/bdhirsh/668/head     -> origin/gh/bdhirsh/668/head
2025-12-04T10:32:19.3879840Z  * [new branch]                gh/bdhirsh/668/orig     -> origin/gh/bdhirsh/668/orig
2025-12-04T10:32:19.3880023Z  * [new branch]                gh/bdhirsh/669/base     -> origin/gh/bdhirsh/669/base
2025-12-04T10:32:19.3880203Z  * [new branch]                gh/bdhirsh/669/head     -> origin/gh/bdhirsh/669/head
2025-12-04T10:32:19.3880379Z  * [new branch]                gh/bdhirsh/669/orig     -> origin/gh/bdhirsh/669/orig
2025-12-04T10:32:19.3880561Z  * [new branch]                gh/bdhirsh/670/base     -> origin/gh/bdhirsh/670/base
2025-12-04T10:32:19.3880742Z  * [new branch]                gh/bdhirsh/670/head     -> origin/gh/bdhirsh/670/head
2025-12-04T10:32:19.3880921Z  * [new branch]                gh/bdhirsh/670/orig     -> origin/gh/bdhirsh/670/orig
2025-12-04T10:32:19.3881100Z  * [new branch]                gh/bdhirsh/672/base     -> origin/gh/bdhirsh/672/base
2025-12-04T10:32:19.3881281Z  * [new branch]                gh/bdhirsh/672/head     -> origin/gh/bdhirsh/672/head
2025-12-04T10:32:19.3881462Z  * [new branch]                gh/bdhirsh/672/orig     -> origin/gh/bdhirsh/672/orig
2025-12-04T10:32:19.3881643Z  * [new branch]                gh/bdhirsh/675/base     -> origin/gh/bdhirsh/675/base
2025-12-04T10:32:19.3881819Z  * [new branch]                gh/bdhirsh/675/head     -> origin/gh/bdhirsh/675/head
2025-12-04T10:32:19.3882000Z  * [new branch]                gh/bdhirsh/675/orig     -> origin/gh/bdhirsh/675/orig
2025-12-04T10:32:19.3882245Z  * [new branch]                gh/bdhirsh/676/base     -> origin/gh/bdhirsh/676/base
2025-12-04T10:32:19.3882422Z  * [new branch]                gh/bdhirsh/676/head     -> origin/gh/bdhirsh/676/head
2025-12-04T10:32:19.3882602Z  * [new branch]                gh/bdhirsh/676/orig     -> origin/gh/bdhirsh/676/orig
2025-12-04T10:32:19.3882675Z  * [new branch]                gh/bdhirsh/677/base     -> origin/gh/bdhirsh/677/base
2025-12-04T10:32:19.3882753Z  * [new branch]                gh/bdhirsh/677/head     -> origin/gh/bdhirsh/677/head
2025-12-04T10:32:19.3882824Z  * [new branch]                gh/bdhirsh/677/orig     -> origin/gh/bdhirsh/677/orig
2025-12-04T10:32:19.3882895Z  * [new branch]                gh/bdhirsh/678/base     -> origin/gh/bdhirsh/678/base
2025-12-04T10:32:19.3882968Z  * [new branch]                gh/bdhirsh/678/head     -> origin/gh/bdhirsh/678/head
2025-12-04T10:32:19.3883038Z  * [new branch]                gh/bdhirsh/678/orig     -> origin/gh/bdhirsh/678/orig
2025-12-04T10:32:19.3883109Z  * [new branch]                gh/bdhirsh/679/base     -> origin/gh/bdhirsh/679/base
2025-12-04T10:32:19.3883181Z  * [new branch]                gh/bdhirsh/679/head     -> origin/gh/bdhirsh/679/head
2025-12-04T10:32:19.3883251Z  * [new branch]                gh/bdhirsh/679/orig     -> origin/gh/bdhirsh/679/orig
2025-12-04T10:32:19.3883322Z  * [new branch]                gh/bdhirsh/680/base     -> origin/gh/bdhirsh/680/base
2025-12-04T10:32:19.3883448Z  * [new branch]                gh/bdhirsh/680/head     -> origin/gh/bdhirsh/680/head
2025-12-04T10:32:19.3883520Z  * [new branch]                gh/bdhirsh/680/orig     -> origin/gh/bdhirsh/680/orig
2025-12-04T10:32:19.3883592Z  * [new branch]                gh/bdhirsh/681/base     -> origin/gh/bdhirsh/681/base
2025-12-04T10:32:19.3883663Z  * [new branch]                gh/bdhirsh/681/head     -> origin/gh/bdhirsh/681/head
2025-12-04T10:32:19.3883734Z  * [new branch]                gh/bdhirsh/681/orig     -> origin/gh/bdhirsh/681/orig
2025-12-04T10:32:19.3883834Z  * [new branch]                gh/benjaminglass1/101/base -> origin/gh/benjaminglass1/101/base
2025-12-04T10:32:19.3883924Z  * [new branch]                gh/benjaminglass1/101/head -> origin/gh/benjaminglass1/101/head
2025-12-04T10:32:19.3884011Z  * [new branch]                gh/benjaminglass1/101/orig -> origin/gh/benjaminglass1/101/orig
2025-12-04T10:32:19.3884102Z  * [new branch]                gh/benjaminglass1/102/base -> origin/gh/benjaminglass1/102/base
2025-12-04T10:32:19.3884188Z  * [new branch]                gh/benjaminglass1/102/head -> origin/gh/benjaminglass1/102/head
2025-12-04T10:32:19.3884275Z  * [new branch]                gh/benjaminglass1/102/orig -> origin/gh/benjaminglass1/102/orig
2025-12-04T10:32:19.3884365Z  * [new branch]                gh/benjaminglass1/106/base -> origin/gh/benjaminglass1/106/base
2025-12-04T10:32:19.3884452Z  * [new branch]                gh/benjaminglass1/106/head -> origin/gh/benjaminglass1/106/head
2025-12-04T10:32:19.3884540Z  * [new branch]                gh/benjaminglass1/106/orig -> origin/gh/benjaminglass1/106/orig
2025-12-04T10:32:19.3884632Z  * [new branch]                gh/benjaminglass1/107/base -> origin/gh/benjaminglass1/107/base
2025-12-04T10:32:19.3884718Z  * [new branch]                gh/benjaminglass1/107/head -> origin/gh/benjaminglass1/107/head
2025-12-04T10:32:19.3884804Z  * [new branch]                gh/benjaminglass1/107/orig -> origin/gh/benjaminglass1/107/orig
2025-12-04T10:32:19.3884895Z  * [new branch]                gh/benjaminglass1/108/base -> origin/gh/benjaminglass1/108/base
2025-12-04T10:32:19.3884982Z  * [new branch]                gh/benjaminglass1/108/head -> origin/gh/benjaminglass1/108/head
2025-12-04T10:32:19.3885071Z  * [new branch]                gh/benjaminglass1/108/orig -> origin/gh/benjaminglass1/108/orig
2025-12-04T10:32:19.3885157Z  * [new branch]                gh/benjaminglass1/109/base -> origin/gh/benjaminglass1/109/base
2025-12-04T10:32:19.3885242Z  * [new branch]                gh/benjaminglass1/109/head -> origin/gh/benjaminglass1/109/head
2025-12-04T10:32:19.3885375Z  * [new branch]                gh/benjaminglass1/109/orig -> origin/gh/benjaminglass1/109/orig
2025-12-04T10:32:19.3885458Z  * [new branch]                gh/benjaminglass1/97/base -> origin/gh/benjaminglass1/97/base
2025-12-04T10:32:19.3885541Z  * [new branch]                gh/benjaminglass1/97/head -> origin/gh/benjaminglass1/97/head
2025-12-04T10:32:19.3885627Z  * [new branch]                gh/benjaminglass1/97/orig -> origin/gh/benjaminglass1/97/orig
2025-12-04T10:32:19.3885707Z  * [new branch]                gh/bobrenjc93/570/base  -> origin/gh/bobrenjc93/570/base
2025-12-04T10:32:19.3885783Z  * [new branch]                gh/bobrenjc93/570/head  -> origin/gh/bobrenjc93/570/head
2025-12-04T10:32:19.3885861Z  * [new branch]                gh/bobrenjc93/570/orig  -> origin/gh/bobrenjc93/570/orig
2025-12-04T10:32:19.3885938Z  * [new branch]                gh/bobrenjc93/604/base  -> origin/gh/bobrenjc93/604/base
2025-12-04T10:32:19.3886013Z  * [new branch]                gh/bobrenjc93/604/head  -> origin/gh/bobrenjc93/604/head
2025-12-04T10:32:19.3886090Z  * [new branch]                gh/bobrenjc93/604/orig  -> origin/gh/bobrenjc93/604/orig
2025-12-04T10:32:19.3886163Z  * [new branch]                gh/bobrenjc93/638/base  -> origin/gh/bobrenjc93/638/base
2025-12-04T10:32:19.3886237Z  * [new branch]                gh/bobrenjc93/638/head  -> origin/gh/bobrenjc93/638/head
2025-12-04T10:32:19.3886340Z  * [new branch]                gh/bobrenjc93/638/orig  -> origin/gh/bobrenjc93/638/orig
2025-12-04T10:32:19.3886413Z  * [new branch]                gh/bobrenjc93/653/base  -> origin/gh/bobrenjc93/653/base
2025-12-04T10:32:19.3886487Z  * [new branch]                gh/bobrenjc93/653/head  -> origin/gh/bobrenjc93/653/head
2025-12-04T10:32:19.3886562Z  * [new branch]                gh/bobrenjc93/653/orig  -> origin/gh/bobrenjc93/653/orig
2025-12-04T10:32:19.3886637Z  * [new branch]                gh/bobrenjc93/654/base  -> origin/gh/bobrenjc93/654/base
2025-12-04T10:32:19.3886716Z  * [new branch]                gh/bobrenjc93/654/head  -> origin/gh/bobrenjc93/654/head
2025-12-04T10:32:19.3886790Z  * [new branch]                gh/bobrenjc93/654/orig  -> origin/gh/bobrenjc93/654/orig
2025-12-04T10:32:19.3886864Z  * [new branch]                gh/bobrenjc93/657/base  -> origin/gh/bobrenjc93/657/base
2025-12-04T10:32:19.3886944Z  * [new branch]                gh/bobrenjc93/657/head  -> origin/gh/bobrenjc93/657/head
2025-12-04T10:32:19.3887016Z  * [new branch]                gh/bobrenjc93/657/orig  -> origin/gh/bobrenjc93/657/orig
2025-12-04T10:32:19.3887092Z  * [new branch]                gh/bobrenjc93/672/base  -> origin/gh/bobrenjc93/672/base
2025-12-04T10:32:19.3887168Z  * [new branch]                gh/bobrenjc93/672/head  -> origin/gh/bobrenjc93/672/head
2025-12-04T10:32:19.3887242Z  * [new branch]                gh/bobrenjc93/672/orig  -> origin/gh/bobrenjc93/672/orig
2025-12-04T10:32:19.3887316Z  * [new branch]                gh/bobrenjc93/679/base  -> origin/gh/bobrenjc93/679/base
2025-12-04T10:32:19.3887393Z  * [new branch]                gh/bobrenjc93/679/head  -> origin/gh/bobrenjc93/679/head
2025-12-04T10:32:19.3887467Z  * [new branch]                gh/bobrenjc93/679/orig  -> origin/gh/bobrenjc93/679/orig
2025-12-04T10:32:19.3887539Z  * [new branch]                gh/bobrenjc93/680/base  -> origin/gh/bobrenjc93/680/base
2025-12-04T10:32:19.3887617Z  * [new branch]                gh/bobrenjc93/680/head  -> origin/gh/bobrenjc93/680/head
2025-12-04T10:32:19.3887692Z  * [new branch]                gh/bobrenjc93/680/orig  -> origin/gh/bobrenjc93/680/orig
2025-12-04T10:32:19.3887765Z  * [new branch]                gh/bobrenjc93/681/base  -> origin/gh/bobrenjc93/681/base
2025-12-04T10:32:19.3887843Z  * [new branch]                gh/bobrenjc93/681/head  -> origin/gh/bobrenjc93/681/head
2025-12-04T10:32:19.3887916Z  * [new branch]                gh/bobrenjc93/681/orig  -> origin/gh/bobrenjc93/681/orig
2025-12-04T10:32:19.3887989Z  * [new branch]                gh/bobrenjc93/682/base  -> origin/gh/bobrenjc93/682/base
2025-12-04T10:32:19.3888102Z  * [new branch]                gh/bobrenjc93/682/head  -> origin/gh/bobrenjc93/682/head
2025-12-04T10:32:19.3888177Z  * [new branch]                gh/bobrenjc93/682/orig  -> origin/gh/bobrenjc93/682/orig
2025-12-04T10:32:19.3888248Z  * [new branch]                gh/bobrenjc93/683/base  -> origin/gh/bobrenjc93/683/base
2025-12-04T10:32:19.3888322Z  * [new branch]                gh/bobrenjc93/683/head  -> origin/gh/bobrenjc93/683/head
2025-12-04T10:32:19.3888397Z  * [new branch]                gh/bobrenjc93/683/orig  -> origin/gh/bobrenjc93/683/orig
2025-12-04T10:32:19.3888476Z  * [new branch]                gh/bobrenjc93/684/base  -> origin/gh/bobrenjc93/684/base
2025-12-04T10:32:19.3888548Z  * [new branch]                gh/bobrenjc93/684/head  -> origin/gh/bobrenjc93/684/head
2025-12-04T10:32:19.3888625Z  * [new branch]                gh/bobrenjc93/684/orig  -> origin/gh/bobrenjc93/684/orig
2025-12-04T10:32:19.3888705Z  * [new branch]                gh/bobrenjc93/685/base  -> origin/gh/bobrenjc93/685/base
2025-12-04T10:32:19.3888780Z  * [new branch]                gh/bobrenjc93/685/head  -> origin/gh/bobrenjc93/685/head
2025-12-04T10:32:19.3888854Z  * [new branch]                gh/bobrenjc93/685/orig  -> origin/gh/bobrenjc93/685/orig
2025-12-04T10:32:19.3888929Z  * [new branch]                gh/bobrenjc93/686/base  -> origin/gh/bobrenjc93/686/base
2025-12-04T10:32:19.3889038Z  * [new branch]                gh/bobrenjc93/686/head  -> origin/gh/bobrenjc93/686/head
2025-12-04T10:32:19.3889113Z  * [new branch]                gh/bobrenjc93/686/orig  -> origin/gh/bobrenjc93/686/orig
2025-12-04T10:32:19.3889189Z  * [new branch]                gh/bobrenjc93/687/base  -> origin/gh/bobrenjc93/687/base
2025-12-04T10:32:19.3889262Z  * [new branch]                gh/bobrenjc93/687/head  -> origin/gh/bobrenjc93/687/head
2025-12-04T10:32:19.3889335Z  * [new branch]                gh/bobrenjc93/687/orig  -> origin/gh/bobrenjc93/687/orig
2025-12-04T10:32:19.3889417Z  * [new branch]                gh/bobrenjc93/688/base  -> origin/gh/bobrenjc93/688/base
2025-12-04T10:32:19.3889492Z  * [new branch]                gh/bobrenjc93/688/head  -> origin/gh/bobrenjc93/688/head
2025-12-04T10:32:19.3889565Z  * [new branch]                gh/bobrenjc93/688/orig  -> origin/gh/bobrenjc93/688/orig
2025-12-04T10:32:19.3889672Z  * [new branch]                gh/bobrenjc93/689/base  -> origin/gh/bobrenjc93/689/base
2025-12-04T10:32:19.3889749Z  * [new branch]                gh/bobrenjc93/689/head  -> origin/gh/bobrenjc93/689/head
2025-12-04T10:32:19.3889823Z  * [new branch]                gh/bobrenjc93/689/orig  -> origin/gh/bobrenjc93/689/orig
2025-12-04T10:32:19.3889900Z  * [new branch]                gh/bobrenjc93/690/base  -> origin/gh/bobrenjc93/690/base
2025-12-04T10:32:19.3889973Z  * [new branch]                gh/bobrenjc93/690/head  -> origin/gh/bobrenjc93/690/head
2025-12-04T10:32:19.3890051Z  * [new branch]                gh/bobrenjc93/690/orig  -> origin/gh/bobrenjc93/690/orig
2025-12-04T10:32:19.3890127Z  * [new branch]                gh/bobrenjc93/691/base  -> origin/gh/bobrenjc93/691/base
2025-12-04T10:32:19.3890202Z  * [new branch]                gh/bobrenjc93/691/head  -> origin/gh/bobrenjc93/691/head
2025-12-04T10:32:19.3890279Z  * [new branch]                gh/bobrenjc93/691/orig  -> origin/gh/bobrenjc93/691/orig
2025-12-04T10:32:19.3890352Z  * [new branch]                gh/bobrenjc93/692/base  -> origin/gh/bobrenjc93/692/base
2025-12-04T10:32:19.3890427Z  * [new branch]                gh/bobrenjc93/692/head  -> origin/gh/bobrenjc93/692/head
2025-12-04T10:32:19.3890505Z  * [new branch]                gh/bobrenjc93/692/orig  -> origin/gh/bobrenjc93/692/orig
2025-12-04T10:32:19.3890581Z  * [new branch]                gh/bobrenjc93/693/base  -> origin/gh/bobrenjc93/693/base
2025-12-04T10:32:19.3890655Z  * [new branch]                gh/bobrenjc93/693/head  -> origin/gh/bobrenjc93/693/head
2025-12-04T10:32:19.3890732Z  * [new branch]                gh/bobrenjc93/693/orig  -> origin/gh/bobrenjc93/693/orig
2025-12-04T10:32:19.3890854Z  * [new branch]                gh/bobrenjc93/694/base  -> origin/gh/bobrenjc93/694/base
2025-12-04T10:32:19.3890929Z  * [new branch]                gh/bobrenjc93/694/head  -> origin/gh/bobrenjc93/694/head
2025-12-04T10:32:19.3891007Z  * [new branch]                gh/bobrenjc93/694/orig  -> origin/gh/bobrenjc93/694/orig
2025-12-04T10:32:19.3891087Z  * [new branch]                gh/bobrenjc93/695/base  -> origin/gh/bobrenjc93/695/base
2025-12-04T10:32:19.3891166Z  * [new branch]                gh/bobrenjc93/695/head  -> origin/gh/bobrenjc93/695/head
2025-12-04T10:32:19.3891246Z  * [new branch]                gh/bobrenjc93/695/orig  -> origin/gh/bobrenjc93/695/orig
2025-12-04T10:32:19.3891315Z  * [new branch]                gh/c00w/23/base         -> origin/gh/c00w/23/base
2025-12-04T10:32:19.3891382Z  * [new branch]                gh/c00w/23/head         -> origin/gh/c00w/23/head
2025-12-04T10:32:19.3891449Z  * [new branch]                gh/c00w/53/base         -> origin/gh/c00w/53/base
2025-12-04T10:32:19.3891515Z  * [new branch]                gh/c00w/53/head         -> origin/gh/c00w/53/head
2025-12-04T10:32:19.3891581Z  * [new branch]                gh/c00w/53/orig         -> origin/gh/c00w/53/orig
2025-12-04T10:32:19.3891648Z  * [new branch]                gh/c00w/54/base         -> origin/gh/c00w/54/base
2025-12-04T10:32:19.3891758Z  * [new branch]                gh/c00w/54/head         -> origin/gh/c00w/54/head
2025-12-04T10:32:19.3891827Z  * [new branch]                gh/c00w/54/orig         -> origin/gh/c00w/54/orig
2025-12-04T10:32:19.3891891Z  * [new branch]                gh/c00w/56/base         -> origin/gh/c00w/56/base
2025-12-04T10:32:19.3891954Z  * [new branch]                gh/c00w/56/head         -> origin/gh/c00w/56/head
2025-12-04T10:32:19.3892021Z  * [new branch]                gh/c00w/56/orig         -> origin/gh/c00w/56/orig
2025-12-04T10:32:19.3892084Z  * [new branch]                gh/c00w/57/base         -> origin/gh/c00w/57/base
2025-12-04T10:32:19.3892149Z  * [new branch]                gh/c00w/57/head         -> origin/gh/c00w/57/head
2025-12-04T10:32:19.3892215Z  * [new branch]                gh/c00w/57/orig         -> origin/gh/c00w/57/orig
2025-12-04T10:32:19.3892279Z  * [new branch]                gh/c00w/58/base         -> origin/gh/c00w/58/base
2025-12-04T10:32:19.3892342Z  * [new branch]                gh/c00w/58/head         -> origin/gh/c00w/58/head
2025-12-04T10:32:19.3892411Z  * [new branch]                gh/c00w/58/orig         -> origin/gh/c00w/58/orig
2025-12-04T10:32:19.3892485Z  * [new branch]                gh/clee2000/1/base      -> origin/gh/clee2000/1/base
2025-12-04T10:32:19.3892556Z  * [new branch]                gh/clee2000/1/head      -> origin/gh/clee2000/1/head
2025-12-04T10:32:19.3892627Z  * [new branch]                gh/clee2000/1/orig      -> origin/gh/clee2000/1/orig
2025-12-04T10:32:19.3892706Z  * [new branch]                gh/coconutruben/1/base  -> origin/gh/coconutruben/1/base
2025-12-04T10:32:19.3892785Z  * [new branch]                gh/coconutruben/1/head  -> origin/gh/coconutruben/1/head
2025-12-04T10:32:19.3892867Z  * [new branch]                gh/coconutruben/55/base -> origin/gh/coconutruben/55/base
2025-12-04T10:32:19.3892946Z  * [new branch]                gh/coconutruben/55/head -> origin/gh/coconutruben/55/head
2025-12-04T10:32:19.3893023Z  * [new branch]                gh/coconutruben/55/orig -> origin/gh/coconutruben/55/orig
2025-12-04T10:32:19.3893103Z  * [new branch]                gh/coconutruben/57/base -> origin/gh/coconutruben/57/base
2025-12-04T10:32:19.3893183Z  * [new branch]                gh/coconutruben/57/head -> origin/gh/coconutruben/57/head
2025-12-04T10:32:19.3893260Z  * [new branch]                gh/coconutruben/57/orig -> origin/gh/coconutruben/57/orig
2025-12-04T10:32:19.3893338Z  * [new branch]                gh/coconutruben/70/base -> origin/gh/coconutruben/70/base
2025-12-04T10:32:19.3893414Z  * [new branch]                gh/coconutruben/70/head -> origin/gh/coconutruben/70/head
2025-12-04T10:32:19.3893532Z  * [new branch]                gh/coconutruben/70/orig -> origin/gh/coconutruben/70/orig
2025-12-04T10:32:19.3893608Z  * [new branch]                gh/coconutruben/71/base -> origin/gh/coconutruben/71/base
2025-12-04T10:32:19.3893686Z  * [new branch]                gh/coconutruben/71/head -> origin/gh/coconutruben/71/head
2025-12-04T10:32:19.3893763Z  * [new branch]                gh/coconutruben/71/orig -> origin/gh/coconutruben/71/orig
2025-12-04T10:32:19.3893839Z  * [new branch]                gh/coconutruben/72/base -> origin/gh/coconutruben/72/base
2025-12-04T10:32:19.3893916Z  * [new branch]                gh/coconutruben/72/head -> origin/gh/coconutruben/72/head
2025-12-04T10:32:19.3893999Z  * [new branch]                gh/coconutruben/72/orig -> origin/gh/coconutruben/72/orig
2025-12-04T10:32:19.3894075Z  * [new branch]                gh/coconutruben/73/base -> origin/gh/coconutruben/73/base
2025-12-04T10:32:19.3894153Z  * [new branch]                gh/coconutruben/73/head -> origin/gh/coconutruben/73/head
2025-12-04T10:32:19.3894232Z  * [new branch]                gh/coconutruben/73/orig -> origin/gh/coconutruben/73/orig
2025-12-04T10:32:19.3894307Z  * [new branch]                gh/coconutruben/74/base -> origin/gh/coconutruben/74/base
2025-12-04T10:32:19.3894384Z  * [new branch]                gh/coconutruben/74/head -> origin/gh/coconutruben/74/head
2025-12-04T10:32:19.3894512Z  * [new branch]                gh/coconutruben/74/orig -> origin/gh/coconutruben/74/orig
2025-12-04T10:32:19.3894588Z  * [new branch]                gh/coconutruben/79/base -> origin/gh/coconutruben/79/base
2025-12-04T10:32:19.3894664Z  * [new branch]                gh/coconutruben/79/head -> origin/gh/coconutruben/79/head
2025-12-04T10:32:19.3894742Z  * [new branch]                gh/coconutruben/79/orig -> origin/gh/coconutruben/79/orig
2025-12-04T10:32:19.3894819Z  * [new branch]                gh/coconutruben/80/base -> origin/gh/coconutruben/80/base
2025-12-04T10:32:19.3894896Z  * [new branch]                gh/coconutruben/80/head -> origin/gh/coconutruben/80/head
2025-12-04T10:32:19.3894973Z  * [new branch]                gh/coconutruben/80/orig -> origin/gh/coconutruben/80/orig
2025-12-04T10:32:19.3895047Z  * [new branch]                gh/coconutruben/82/base -> origin/gh/coconutruben/82/base
2025-12-04T10:32:19.3895123Z  * [new branch]                gh/coconutruben/82/head -> origin/gh/coconutruben/82/head
2025-12-04T10:32:19.3895197Z  * [new branch]                gh/coconutruben/82/orig -> origin/gh/coconutruben/82/orig
2025-12-04T10:32:19.3895271Z  * [new branch]                gh/coconutruben/83/base -> origin/gh/coconutruben/83/base
2025-12-04T10:32:19.3895346Z  * [new branch]                gh/coconutruben/83/head -> origin/gh/coconutruben/83/head
2025-12-04T10:32:19.3895419Z  * [new branch]                gh/coconutruben/83/orig -> origin/gh/coconutruben/83/orig
2025-12-04T10:32:19.3895494Z  * [new branch]                gh/coconutruben/84/base -> origin/gh/coconutruben/84/base
2025-12-04T10:32:19.3895572Z  * [new branch]                gh/coconutruben/84/head -> origin/gh/coconutruben/84/head
2025-12-04T10:32:19.3895646Z  * [new branch]                gh/coconutruben/84/orig -> origin/gh/coconutruben/84/orig
2025-12-04T10:32:19.3895720Z  * [new branch]                gh/coconutruben/85/base -> origin/gh/coconutruben/85/base
2025-12-04T10:32:19.3895797Z  * [new branch]                gh/coconutruben/85/head -> origin/gh/coconutruben/85/head
2025-12-04T10:32:19.3895871Z  * [new branch]                gh/coconutruben/85/orig -> origin/gh/coconutruben/85/orig
2025-12-04T10:32:19.3895944Z  * [new branch]                gh/coconutruben/86/base -> origin/gh/coconutruben/86/base
2025-12-04T10:32:19.3896019Z  * [new branch]                gh/coconutruben/86/head -> origin/gh/coconutruben/86/head
2025-12-04T10:32:19.3896093Z  * [new branch]                gh/coconutruben/86/orig -> origin/gh/coconutruben/86/orig
2025-12-04T10:32:19.3896200Z  * [new branch]                gh/colinchan15/1/base   -> origin/gh/colinchan15/1/base
2025-12-04T10:32:19.3896275Z  * [new branch]                gh/colinchan15/1/head   -> origin/gh/colinchan15/1/head
2025-12-04T10:32:19.3896348Z  * [new branch]                gh/colinchan15/2/base   -> origin/gh/colinchan15/2/base
2025-12-04T10:32:19.3896420Z  * [new branch]                gh/colinchan15/2/head   -> origin/gh/colinchan15/2/head
2025-12-04T10:32:19.3896494Z  * [new branch]                gh/colinchan15/3/base   -> origin/gh/colinchan15/3/base
2025-12-04T10:32:19.3896566Z  * [new branch]                gh/colinchan15/3/head   -> origin/gh/colinchan15/3/head
2025-12-04T10:32:19.3896638Z  * [new branch]                gh/colinchan15/6/base   -> origin/gh/colinchan15/6/base
2025-12-04T10:32:19.3896710Z  * [new branch]                gh/colinchan15/6/head   -> origin/gh/colinchan15/6/head
2025-12-04T10:32:19.3896775Z  * [new branch]                gh/d4l3k/1/base         -> origin/gh/d4l3k/1/base
2025-12-04T10:32:19.3896841Z  * [new branch]                gh/d4l3k/1/head         -> origin/gh/d4l3k/1/head
2025-12-04T10:32:19.3896904Z  * [new branch]                gh/d4l3k/2/base         -> origin/gh/d4l3k/2/base
2025-12-04T10:32:19.3896967Z  * [new branch]                gh/d4l3k/2/head         -> origin/gh/d4l3k/2/head
2025-12-04T10:32:19.3897030Z  * [new branch]                gh/d4l3k/2/orig         -> origin/gh/d4l3k/2/orig
2025-12-04T10:32:19.3897133Z  * [new branch]                gh/d4l3k/3/base         -> origin/gh/d4l3k/3/base
2025-12-04T10:32:19.3897196Z  * [new branch]                gh/d4l3k/3/head         -> origin/gh/d4l3k/3/head
2025-12-04T10:32:19.3897261Z  * [new branch]                gh/d4l3k/3/orig         -> origin/gh/d4l3k/3/orig
2025-12-04T10:32:19.3897325Z  * [new branch]                gh/d4l3k/4/base         -> origin/gh/d4l3k/4/base
2025-12-04T10:32:19.3897388Z  * [new branch]                gh/d4l3k/4/head         -> origin/gh/d4l3k/4/head
2025-12-04T10:32:19.3897452Z  * [new branch]                gh/d4l3k/4/orig         -> origin/gh/d4l3k/4/orig
2025-12-04T10:32:19.3897516Z  * [new branch]                gh/d4l3k/5/base         -> origin/gh/d4l3k/5/base
2025-12-04T10:32:19.3897579Z  * [new branch]                gh/d4l3k/5/orig         -> origin/gh/d4l3k/5/orig
2025-12-04T10:32:19.3897671Z  * [new branch]                gh/davidberard98/392/base -> origin/gh/davidberard98/392/base
2025-12-04T10:32:19.3897760Z  * [new branch]                gh/davidberard98/392/head -> origin/gh/davidberard98/392/head
2025-12-04T10:32:19.3897845Z  * [new branch]                gh/davidberard98/392/orig -> origin/gh/davidberard98/392/orig
2025-12-04T10:32:19.3897931Z  * [new branch]                gh/davidberard98/399/base -> origin/gh/davidberard98/399/base
2025-12-04T10:32:19.3898011Z  * [new branch]                gh/davidberard98/399/head -> origin/gh/davidberard98/399/head
2025-12-04T10:32:19.3898091Z  * [new branch]                gh/davidberard98/399/orig -> origin/gh/davidberard98/399/orig
2025-12-04T10:32:19.3898169Z  * [new branch]                gh/desertfire/605/base  -> origin/gh/desertfire/605/base
2025-12-04T10:32:19.3898244Z  * [new branch]                gh/desertfire/605/head  -> origin/gh/desertfire/605/head
2025-12-04T10:32:19.3898317Z  * [new branch]                gh/desertfire/605/orig  -> origin/gh/desertfire/605/orig
2025-12-04T10:32:19.3898392Z  * [new branch]                gh/desertfire/606/base  -> origin/gh/desertfire/606/base
2025-12-04T10:32:19.3898465Z  * [new branch]                gh/desertfire/606/head  -> origin/gh/desertfire/606/head
2025-12-04T10:32:19.3898538Z  * [new branch]                gh/desertfire/606/orig  -> origin/gh/desertfire/606/orig
2025-12-04T10:32:19.3898610Z  * [new branch]                gh/desertfire/607/base  -> origin/gh/desertfire/607/base
2025-12-04T10:32:19.3898682Z  * [new branch]                gh/desertfire/607/head  -> origin/gh/desertfire/607/head
2025-12-04T10:32:19.3898755Z  * [new branch]                gh/desertfire/607/orig  -> origin/gh/desertfire/607/orig
2025-12-04T10:32:19.3898860Z  * [new branch]                gh/desertfire/608/base  -> origin/gh/desertfire/608/base
2025-12-04T10:32:19.3898932Z  * [new branch]                gh/desertfire/608/head  -> origin/gh/desertfire/608/head
2025-12-04T10:32:19.3899005Z  * [new branch]                gh/desertfire/608/orig  -> origin/gh/desertfire/608/orig
2025-12-04T10:32:19.3899078Z  * [new branch]                gh/desertfire/609/base  -> origin/gh/desertfire/609/base
2025-12-04T10:32:19.3899151Z  * [new branch]                gh/desertfire/609/head  -> origin/gh/desertfire/609/head
2025-12-04T10:32:19.3899226Z  * [new branch]                gh/desertfire/609/orig  -> origin/gh/desertfire/609/orig
2025-12-04T10:32:19.3899298Z  * [new branch]                gh/desertfire/610/base  -> origin/gh/desertfire/610/base
2025-12-04T10:32:19.3899370Z  * [new branch]                gh/desertfire/610/head  -> origin/gh/desertfire/610/head
2025-12-04T10:32:19.3899446Z  * [new branch]                gh/desertfire/610/orig  -> origin/gh/desertfire/610/orig
2025-12-04T10:32:19.3899522Z  * [new branch]                gh/desertfire/611/base  -> origin/gh/desertfire/611/base
2025-12-04T10:32:19.3899643Z  * [new branch]                gh/desertfire/611/head  -> origin/gh/desertfire/611/head
2025-12-04T10:32:19.3899720Z  * [new branch]                gh/desertfire/611/orig  -> origin/gh/desertfire/611/orig
2025-12-04T10:32:19.3899842Z  * [new branch]                gh/desertfire/612/base  -> origin/gh/desertfire/612/base
2025-12-04T10:32:19.3899915Z  * [new branch]                gh/desertfire/612/head  -> origin/gh/desertfire/612/head
2025-12-04T10:32:19.3899988Z  * [new branch]                gh/desertfire/612/orig  -> origin/gh/desertfire/612/orig
2025-12-04T10:32:19.3900060Z  * [new branch]                gh/desertfire/613/base  -> origin/gh/desertfire/613/base
2025-12-04T10:32:19.3900135Z  * [new branch]                gh/desertfire/613/head  -> origin/gh/desertfire/613/head
2025-12-04T10:32:19.3900208Z  * [new branch]                gh/desertfire/613/orig  -> origin/gh/desertfire/613/orig
2025-12-04T10:32:19.3900285Z  * [new branch]                gh/desertfire/614/base  -> origin/gh/desertfire/614/base
2025-12-04T10:32:19.3900360Z  * [new branch]                gh/desertfire/614/head  -> origin/gh/desertfire/614/head
2025-12-04T10:32:19.3900433Z  * [new branch]                gh/desertfire/614/orig  -> origin/gh/desertfire/614/orig
2025-12-04T10:32:19.3900506Z  * [new branch]                gh/desertfire/615/base  -> origin/gh/desertfire/615/base
2025-12-04T10:32:19.3900580Z  * [new branch]                gh/desertfire/615/head  -> origin/gh/desertfire/615/head
2025-12-04T10:32:19.3900654Z  * [new branch]                gh/desertfire/615/orig  -> origin/gh/desertfire/615/orig
2025-12-04T10:32:19.3900729Z  * [new branch]                gh/desertfire/616/base  -> origin/gh/desertfire/616/base
2025-12-04T10:32:19.3900808Z  * [new branch]                gh/desertfire/616/head  -> origin/gh/desertfire/616/head
2025-12-04T10:32:19.3900884Z  * [new branch]                gh/desertfire/616/orig  -> origin/gh/desertfire/616/orig
2025-12-04T10:32:19.3900960Z  * [new branch]                gh/desertfire/617/base  -> origin/gh/desertfire/617/base
2025-12-04T10:32:19.3901036Z  * [new branch]                gh/desertfire/617/head  -> origin/gh/desertfire/617/head
2025-12-04T10:32:19.3901109Z  * [new branch]                gh/desertfire/617/orig  -> origin/gh/desertfire/617/orig
2025-12-04T10:32:19.3901181Z  * [new branch]                gh/dharakk/1/base       -> origin/gh/dharakk/1/base
2025-12-04T10:32:19.3901254Z  * [new branch]                gh/dharakk/1/head       -> origin/gh/dharakk/1/head
2025-12-04T10:32:19.3901328Z  * [new branch]                gh/drisspg/170/base     -> origin/gh/drisspg/170/base
2025-12-04T10:32:19.3901402Z  * [new branch]                gh/drisspg/170/head     -> origin/gh/drisspg/170/head
2025-12-04T10:32:19.3901478Z  * [new branch]                gh/drisspg/170/orig     -> origin/gh/drisspg/170/orig
2025-12-04T10:32:19.3901600Z  * [new branch]                gh/drisspg/182/base     -> origin/gh/drisspg/182/base
2025-12-04T10:32:19.3901672Z  * [new branch]                gh/drisspg/182/head     -> origin/gh/drisspg/182/head
2025-12-04T10:32:19.3901748Z  * [new branch]                gh/drisspg/183/base     -> origin/gh/drisspg/183/base
2025-12-04T10:32:19.3901819Z  * [new branch]                gh/drisspg/183/head     -> origin/gh/drisspg/183/head
2025-12-04T10:32:19.3901894Z  * [new branch]                gh/drisspg/184/base     -> origin/gh/drisspg/184/base
2025-12-04T10:32:19.3901966Z  * [new branch]                gh/drisspg/184/head     -> origin/gh/drisspg/184/head
2025-12-04T10:32:19.3902038Z  * [new branch]                gh/drisspg/185/base     -> origin/gh/drisspg/185/base
2025-12-04T10:32:19.3902114Z  * [new branch]                gh/drisspg/185/head     -> origin/gh/drisspg/185/head
2025-12-04T10:32:19.3902187Z  * [new branch]                gh/drisspg/194/base     -> origin/gh/drisspg/194/base
2025-12-04T10:32:19.3902261Z  * [new branch]                gh/drisspg/194/head     -> origin/gh/drisspg/194/head
2025-12-04T10:32:19.3902336Z  * [new branch]                gh/drisspg/194/orig     -> origin/gh/drisspg/194/orig
2025-12-04T10:32:19.3902408Z  * [new branch]                gh/drisspg/200/base     -> origin/gh/drisspg/200/base
2025-12-04T10:32:19.3902480Z  * [new branch]                gh/drisspg/200/head     -> origin/gh/drisspg/200/head
2025-12-04T10:32:19.3902593Z  * [new branch]                gh/drisspg/200/orig     -> origin/gh/drisspg/200/orig
2025-12-04T10:32:19.3902817Z  * [new branch]                gh/drisspg/218/base     -> origin/gh/drisspg/218/base
2025-12-04T10:32:19.3902889Z  * [new branch]                gh/drisspg/218/head     -> origin/gh/drisspg/218/head
2025-12-04T10:32:19.3902966Z  * [new branch]                gh/drisspg/218/orig     -> origin/gh/drisspg/218/orig
2025-12-04T10:32:19.3903038Z  * [new branch]                gh/drisspg/219/base     -> origin/gh/drisspg/219/base
2025-12-04T10:32:19.3903112Z  * [new branch]                gh/drisspg/219/head     -> origin/gh/drisspg/219/head
2025-12-04T10:32:19.3903188Z  * [new branch]                gh/drisspg/219/orig     -> origin/gh/drisspg/219/orig
2025-12-04T10:32:19.3903260Z  * [new branch]                gh/drisspg/220/base     -> origin/gh/drisspg/220/base
2025-12-04T10:32:19.3903332Z  * [new branch]                gh/drisspg/220/head     -> origin/gh/drisspg/220/head
2025-12-04T10:32:19.3903411Z  * [new branch]                gh/drisspg/220/orig     -> origin/gh/drisspg/220/orig
2025-12-04T10:32:19.3903482Z  * [new branch]                gh/drisspg/221/base     -> origin/gh/drisspg/221/base
2025-12-04T10:32:19.3903554Z  * [new branch]                gh/drisspg/221/head     -> origin/gh/drisspg/221/head
2025-12-04T10:32:19.3903628Z  * [new branch]                gh/drisspg/221/orig     -> origin/gh/drisspg/221/orig
2025-12-04T10:32:19.3903699Z  * [new branch]                gh/drisspg/222/base     -> origin/gh/drisspg/222/base
2025-12-04T10:32:19.3903777Z  * [new branch]                gh/drisspg/222/head     -> origin/gh/drisspg/222/head
2025-12-04T10:32:19.3903850Z  * [new branch]                gh/drisspg/222/orig     -> origin/gh/drisspg/222/orig
2025-12-04T10:32:19.3903921Z  * [new branch]                gh/drisspg/223/base     -> origin/gh/drisspg/223/base
2025-12-04T10:32:19.3903998Z  * [new branch]                gh/drisspg/223/head     -> origin/gh/drisspg/223/head
2025-12-04T10:32:19.3904073Z  * [new branch]                gh/drisspg/223/orig     -> origin/gh/drisspg/223/orig
2025-12-04T10:32:19.3904145Z  * [new branch]                gh/drisspg/224/base     -> origin/gh/drisspg/224/base
2025-12-04T10:32:19.3904220Z  * [new branch]                gh/drisspg/224/head     -> origin/gh/drisspg/224/head
2025-12-04T10:32:19.3904293Z  * [new branch]                gh/drisspg/224/orig     -> origin/gh/drisspg/224/orig
2025-12-04T10:32:19.3904363Z  * [new branch]                gh/drisspg/225/base     -> origin/gh/drisspg/225/base
2025-12-04T10:32:19.3904477Z  * [new branch]                gh/drisspg/225/head     -> origin/gh/drisspg/225/head
2025-12-04T10:32:19.3904548Z  * [new branch]                gh/drisspg/225/orig     -> origin/gh/drisspg/225/orig
2025-12-04T10:32:19.3904620Z  * [new branch]                gh/drisspg/226/base     -> origin/gh/drisspg/226/base
2025-12-04T10:32:19.3904695Z  * [new branch]                gh/drisspg/226/head     -> origin/gh/drisspg/226/head
2025-12-04T10:32:19.3904767Z  * [new branch]                gh/drisspg/226/orig     -> origin/gh/drisspg/226/orig
2025-12-04T10:32:19.3904839Z  * [new branch]                gh/drisspg/227/base     -> origin/gh/drisspg/227/base
2025-12-04T10:32:19.3904915Z  * [new branch]                gh/drisspg/227/head     -> origin/gh/drisspg/227/head
2025-12-04T10:32:19.3904986Z  * [new branch]                gh/drisspg/227/orig     -> origin/gh/drisspg/227/orig
2025-12-04T10:32:19.3905057Z  * [new branch]                gh/drisspg/228/base     -> origin/gh/drisspg/228/base
2025-12-04T10:32:19.3905136Z  * [new branch]                gh/drisspg/228/head     -> origin/gh/drisspg/228/head
2025-12-04T10:32:19.3905207Z  * [new branch]                gh/drisspg/228/orig     -> origin/gh/drisspg/228/orig
2025-12-04T10:32:19.3905278Z  * [new branch]                gh/drisspg/229/base     -> origin/gh/drisspg/229/base
2025-12-04T10:32:19.3905353Z  * [new branch]                gh/drisspg/229/head     -> origin/gh/drisspg/229/head
2025-12-04T10:32:19.3905469Z  * [new branch]                gh/drisspg/229/orig     -> origin/gh/drisspg/229/orig
2025-12-04T10:32:19.3905545Z  * [new branch]                gh/drisspg/230/base     -> origin/gh/drisspg/230/base
2025-12-04T10:32:19.3905616Z  * [new branch]                gh/drisspg/230/head     -> origin/gh/drisspg/230/head
2025-12-04T10:32:19.3905687Z  * [new branch]                gh/drisspg/230/orig     -> origin/gh/drisspg/230/orig
2025-12-04T10:32:19.3905765Z  * [new branch]                gh/dsjohns2/1/base      -> origin/gh/dsjohns2/1/base
2025-12-04T10:32:19.3905840Z  * [new branch]                gh/dsjohns2/1/head      -> origin/gh/dsjohns2/1/head
2025-12-04T10:32:19.3905921Z  * [new branch]                gh/dzmitry-huba/1/base  -> origin/gh/dzmitry-huba/1/base
2025-12-04T10:32:19.3906006Z  * [new branch]                gh/dzmitry-huba/1/head  -> origin/gh/dzmitry-huba/1/head
2025-12-04T10:32:19.3906086Z  * [new branch]                gh/dzmitry-huba/12/base -> origin/gh/dzmitry-huba/12/base
2025-12-04T10:32:19.3906164Z  * [new branch]                gh/dzmitry-huba/12/head -> origin/gh/dzmitry-huba/12/head
2025-12-04T10:32:19.3906245Z  * [new branch]                gh/dzmitry-huba/12/orig -> origin/gh/dzmitry-huba/12/orig
2025-12-04T10:32:19.3906322Z  * [new branch]                gh/dzmitry-huba/13/base -> origin/gh/dzmitry-huba/13/base
2025-12-04T10:32:19.3906399Z  * [new branch]                gh/dzmitry-huba/13/head -> origin/gh/dzmitry-huba/13/head
2025-12-04T10:32:19.3906480Z  * [new branch]                gh/dzmitry-huba/13/orig -> origin/gh/dzmitry-huba/13/orig
2025-12-04T10:32:19.3906558Z  * [new branch]                gh/dzmitry-huba/14/base -> origin/gh/dzmitry-huba/14/base
2025-12-04T10:32:19.3906635Z  * [new branch]                gh/dzmitry-huba/14/head -> origin/gh/dzmitry-huba/14/head
2025-12-04T10:32:19.3906718Z  * [new branch]                gh/dzmitry-huba/14/orig -> origin/gh/dzmitry-huba/14/orig
2025-12-04T10:32:19.3906796Z  * [new branch]                gh/dzmitry-huba/15/base -> origin/gh/dzmitry-huba/15/base
2025-12-04T10:32:19.3906874Z  * [new branch]                gh/dzmitry-huba/15/head -> origin/gh/dzmitry-huba/15/head
2025-12-04T10:32:19.3906955Z  * [new branch]                gh/dzmitry-huba/15/orig -> origin/gh/dzmitry-huba/15/orig
2025-12-04T10:32:19.3907032Z  * [new branch]                gh/dzmitry-huba/16/base -> origin/gh/dzmitry-huba/16/base
2025-12-04T10:32:19.3907108Z  * [new branch]                gh/dzmitry-huba/16/head -> origin/gh/dzmitry-huba/16/head
2025-12-04T10:32:19.3907187Z  * [new branch]                gh/dzmitry-huba/16/orig -> origin/gh/dzmitry-huba/16/orig
2025-12-04T10:32:19.3907295Z  * [new branch]                gh/dzmitry-huba/17/base -> origin/gh/dzmitry-huba/17/base
2025-12-04T10:32:19.3907375Z  * [new branch]                gh/dzmitry-huba/17/head -> origin/gh/dzmitry-huba/17/head
2025-12-04T10:32:19.3907451Z  * [new branch]                gh/dzmitry-huba/17/orig -> origin/gh/dzmitry-huba/17/orig
2025-12-04T10:32:19.3907530Z  * [new branch]                gh/dzmitry-huba/2/base  -> origin/gh/dzmitry-huba/2/base
2025-12-04T10:32:19.3907611Z  * [new branch]                gh/dzmitry-huba/2/head  -> origin/gh/dzmitry-huba/2/head
2025-12-04T10:32:19.3907686Z  * [new branch]                gh/dzmitry-huba/3/base  -> origin/gh/dzmitry-huba/3/base
2025-12-04T10:32:19.3907763Z  * [new branch]                gh/dzmitry-huba/3/head  -> origin/gh/dzmitry-huba/3/head
2025-12-04T10:32:19.3907845Z  * [new branch]                gh/eellison/808/base    -> origin/gh/eellison/808/base
2025-12-04T10:32:19.3907922Z  * [new branch]                gh/eellison/808/head    -> origin/gh/eellison/808/head
2025-12-04T10:32:19.3907997Z  * [new branch]                gh/eellison/808/orig    -> origin/gh/eellison/808/orig
2025-12-04T10:32:19.3908074Z  * [new branch]                gh/eellison/822/base    -> origin/gh/eellison/822/base
2025-12-04T10:32:19.3908146Z  * [new branch]                gh/eellison/822/head    -> origin/gh/eellison/822/head
2025-12-04T10:32:19.3908251Z  * [new branch]                gh/eellison/822/orig    -> origin/gh/eellison/822/orig
2025-12-04T10:32:19.3908332Z  * [new branch]                gh/eellison/823/base    -> origin/gh/eellison/823/base
2025-12-04T10:32:19.3908404Z  * [new branch]                gh/eellison/823/head    -> origin/gh/eellison/823/head
2025-12-04T10:32:19.3908476Z  * [new branch]                gh/eellison/823/orig    -> origin/gh/eellison/823/orig
2025-12-04T10:32:19.3908555Z  * [new branch]                gh/eellison/862/base    -> origin/gh/eellison/862/base
2025-12-04T10:32:19.3908629Z  * [new branch]                gh/eellison/862/head    -> origin/gh/eellison/862/head
2025-12-04T10:32:19.3908702Z  * [new branch]                gh/eellison/862/orig    -> origin/gh/eellison/862/orig
2025-12-04T10:32:19.3908779Z  * [new branch]                gh/eellison/863/base    -> origin/gh/eellison/863/base
2025-12-04T10:32:19.3908851Z  * [new branch]                gh/eellison/863/head    -> origin/gh/eellison/863/head
2025-12-04T10:32:19.3908930Z  * [new branch]                gh/eellison/863/orig    -> origin/gh/eellison/863/orig
2025-12-04T10:32:19.3909003Z  * [new branch]                gh/eellison/864/base    -> origin/gh/eellison/864/base
2025-12-04T10:32:19.3909076Z  * [new branch]                gh/eellison/864/head    -> origin/gh/eellison/864/head
2025-12-04T10:32:19.3909153Z  * [new branch]                gh/eellison/864/orig    -> origin/gh/eellison/864/orig
2025-12-04T10:32:19.3909225Z  * [new branch]                gh/eellison/865/base    -> origin/gh/eellison/865/base
2025-12-04T10:32:19.3909300Z  * [new branch]                gh/eellison/865/head    -> origin/gh/eellison/865/head
2025-12-04T10:32:19.3909377Z  * [new branch]                gh/eellison/865/orig    -> origin/gh/eellison/865/orig
2025-12-04T10:32:19.3909450Z  * [new branch]                gh/eellison/866/base    -> origin/gh/eellison/866/base
2025-12-04T10:32:19.3909523Z  * [new branch]                gh/eellison/866/head    -> origin/gh/eellison/866/head
2025-12-04T10:32:19.3909635Z  * [new branch]                gh/eellison/866/orig    -> origin/gh/eellison/866/orig
2025-12-04T10:32:19.3909711Z  * [new branch]                gh/eellison/867/base    -> origin/gh/eellison/867/base
2025-12-04T10:32:19.3909783Z  * [new branch]                gh/eellison/867/head    -> origin/gh/eellison/867/head
2025-12-04T10:32:19.3909862Z  * [new branch]                gh/eellison/867/orig    -> origin/gh/eellison/867/orig
2025-12-04T10:32:19.3909936Z  * [new branch]                gh/eellison/868/base    -> origin/gh/eellison/868/base
2025-12-04T10:32:19.3910059Z  * [new branch]                gh/eellison/868/head    -> origin/gh/eellison/868/head
2025-12-04T10:32:19.3910137Z  * [new branch]                gh/eellison/868/orig    -> origin/gh/eellison/868/orig
2025-12-04T10:32:19.3910210Z  * [new branch]                gh/eellison/869/base    -> origin/gh/eellison/869/base
2025-12-04T10:32:19.3910285Z  * [new branch]                gh/eellison/869/head    -> origin/gh/eellison/869/head
2025-12-04T10:32:19.3910364Z  * [new branch]                gh/eellison/869/orig    -> origin/gh/eellison/869/orig
2025-12-04T10:32:19.3910438Z  * [new branch]                gh/eellison/870/base    -> origin/gh/eellison/870/base
2025-12-04T10:32:19.3910515Z  * [new branch]                gh/eellison/870/head    -> origin/gh/eellison/870/head
2025-12-04T10:32:19.3910588Z  * [new branch]                gh/eellison/870/orig    -> origin/gh/eellison/870/orig
2025-12-04T10:32:19.3910661Z  * [new branch]                gh/eellison/871/base    -> origin/gh/eellison/871/base
2025-12-04T10:32:19.3910739Z  * [new branch]                gh/eellison/871/head    -> origin/gh/eellison/871/head
2025-12-04T10:32:19.3910813Z  * [new branch]                gh/eellison/871/orig    -> origin/gh/eellison/871/orig
2025-12-04T10:32:19.3910886Z  * [new branch]                gh/eellison/872/base    -> origin/gh/eellison/872/base
2025-12-04T10:32:19.3911015Z  * [new branch]                gh/eellison/872/head    -> origin/gh/eellison/872/head
2025-12-04T10:32:19.3911089Z  * [new branch]                gh/eellison/872/orig    -> origin/gh/eellison/872/orig
2025-12-04T10:32:19.3911162Z  * [new branch]                gh/eellison/873/base    -> origin/gh/eellison/873/base
2025-12-04T10:32:19.3911239Z  * [new branch]                gh/eellison/873/head    -> origin/gh/eellison/873/head
2025-12-04T10:32:19.3911312Z  * [new branch]                gh/eellison/873/orig    -> origin/gh/eellison/873/orig
2025-12-04T10:32:19.3911385Z  * [new branch]                gh/eellison/874/base    -> origin/gh/eellison/874/base
2025-12-04T10:32:19.3911464Z  * [new branch]                gh/eellison/874/head    -> origin/gh/eellison/874/head
2025-12-04T10:32:19.3911539Z  * [new branch]                gh/eellison/874/orig    -> origin/gh/eellison/874/orig
2025-12-04T10:32:19.3911612Z  * [new branch]                gh/eellison/875/base    -> origin/gh/eellison/875/base
2025-12-04T10:32:19.3911693Z  * [new branch]                gh/eellison/875/head    -> origin/gh/eellison/875/head
2025-12-04T10:32:19.3911771Z  * [new branch]                gh/eellison/875/orig    -> origin/gh/eellison/875/orig
2025-12-04T10:32:19.3911843Z  * [new branch]                gh/eellison/876/base    -> origin/gh/eellison/876/base
2025-12-04T10:32:19.3911919Z  * [new branch]                gh/eellison/876/head    -> origin/gh/eellison/876/head
2025-12-04T10:32:19.3911993Z  * [new branch]                gh/eellison/876/orig    -> origin/gh/eellison/876/orig
2025-12-04T10:32:19.3912066Z  * [new branch]                gh/eellison/877/base    -> origin/gh/eellison/877/base
2025-12-04T10:32:19.3912145Z  * [new branch]                gh/eellison/877/head    -> origin/gh/eellison/877/head
2025-12-04T10:32:19.3912221Z  * [new branch]                gh/eellison/877/orig    -> origin/gh/eellison/877/orig
2025-12-04T10:32:19.3912298Z  * [new branch]                gh/eellison/878/base    -> origin/gh/eellison/878/base
2025-12-04T10:32:19.3912372Z  * [new branch]                gh/eellison/878/head    -> origin/gh/eellison/878/head
2025-12-04T10:32:19.3912445Z  * [new branch]                gh/eellison/878/orig    -> origin/gh/eellison/878/orig
2025-12-04T10:32:19.3912522Z  * [new branch]                gh/eellison/879/base    -> origin/gh/eellison/879/base
2025-12-04T10:32:19.3912595Z  * [new branch]                gh/eellison/879/head    -> origin/gh/eellison/879/head
2025-12-04T10:32:19.3912667Z  * [new branch]                gh/eellison/879/orig    -> origin/gh/eellison/879/orig
2025-12-04T10:32:19.3912744Z  * [new branch]                gh/eellison/880/base    -> origin/gh/eellison/880/base
2025-12-04T10:32:19.3912847Z  * [new branch]                gh/eellison/880/head    -> origin/gh/eellison/880/head
2025-12-04T10:32:19.3912920Z  * [new branch]                gh/eellison/880/orig    -> origin/gh/eellison/880/orig
2025-12-04T10:32:19.3912996Z  * [new branch]                gh/eellison/881/base    -> origin/gh/eellison/881/base
2025-12-04T10:32:19.3913071Z  * [new branch]                gh/eellison/881/head    -> origin/gh/eellison/881/head
2025-12-04T10:32:19.3913145Z  * [new branch]                gh/eellison/881/orig    -> origin/gh/eellison/881/orig
2025-12-04T10:32:19.3913221Z  * [new branch]                gh/eellison/882/base    -> origin/gh/eellison/882/base
2025-12-04T10:32:19.3913294Z  * [new branch]                gh/eellison/882/head    -> origin/gh/eellison/882/head
2025-12-04T10:32:19.3913367Z  * [new branch]                gh/eellison/882/orig    -> origin/gh/eellison/882/orig
2025-12-04T10:32:19.3913443Z  * [new branch]                gh/eellison/883/base    -> origin/gh/eellison/883/base
2025-12-04T10:32:19.3913517Z  * [new branch]                gh/eellison/883/head    -> origin/gh/eellison/883/head
2025-12-04T10:32:19.3913591Z  * [new branch]                gh/eellison/883/orig    -> origin/gh/eellison/883/orig
2025-12-04T10:32:19.3913669Z  * [new branch]                gh/eellison/884/base    -> origin/gh/eellison/884/base
2025-12-04T10:32:19.3913774Z  * [new branch]                gh/eellison/884/head    -> origin/gh/eellison/884/head
2025-12-04T10:32:19.3913852Z  * [new branch]                gh/eellison/884/orig    -> origin/gh/eellison/884/orig
2025-12-04T10:32:19.3913921Z  * [new branch]                gh/etaf/147/base        -> origin/gh/etaf/147/base
2025-12-04T10:32:19.3913990Z  * [new branch]                gh/etaf/147/head        -> origin/gh/etaf/147/head
2025-12-04T10:32:19.3914065Z  * [new branch]                gh/etaf/154/base        -> origin/gh/etaf/154/base
2025-12-04T10:32:19.3914133Z  * [new branch]                gh/etaf/154/head        -> origin/gh/etaf/154/head
2025-12-04T10:32:19.3914202Z  * [new branch]                gh/etaf/154/orig        -> origin/gh/etaf/154/orig
2025-12-04T10:32:19.3914274Z  * [new branch]                gh/etaf/156/base        -> origin/gh/etaf/156/base
2025-12-04T10:32:19.3914340Z  * [new branch]                gh/etaf/156/head        -> origin/gh/etaf/156/head
2025-12-04T10:32:19.3914412Z  * [new branch]                gh/etaf/156/orig        -> origin/gh/etaf/156/orig
2025-12-04T10:32:19.3914483Z  * [new branch]                gh/etaf/157/base        -> origin/gh/etaf/157/base
2025-12-04T10:32:19.3914550Z  * [new branch]                gh/etaf/157/head        -> origin/gh/etaf/157/head
2025-12-04T10:32:19.3914616Z  * [new branch]                gh/etaf/157/orig        -> origin/gh/etaf/157/orig
2025-12-04T10:32:19.3914687Z  * [new branch]                gh/etaf/158/base        -> origin/gh/etaf/158/base
2025-12-04T10:32:19.3914753Z  * [new branch]                gh/etaf/158/head        -> origin/gh/etaf/158/head
2025-12-04T10:32:19.3914820Z  * [new branch]                gh/etaf/158/orig        -> origin/gh/etaf/158/orig
2025-12-04T10:32:19.3914892Z  * [new branch]                gh/etaf/159/base        -> origin/gh/etaf/159/base
2025-12-04T10:32:19.3914958Z  * [new branch]                gh/etaf/159/head        -> origin/gh/etaf/159/head
2025-12-04T10:32:19.3915027Z  * [new branch]                gh/etaf/159/orig        -> origin/gh/etaf/159/orig
2025-12-04T10:32:19.3915098Z  * [new branch]                gh/etaf/160/base        -> origin/gh/etaf/160/base
2025-12-04T10:32:19.3915164Z  * [new branch]                gh/etaf/160/head        -> origin/gh/etaf/160/head
2025-12-04T10:32:19.3915229Z  * [new branch]                gh/etaf/160/orig        -> origin/gh/etaf/160/orig
2025-12-04T10:32:19.3915300Z  * [new branch]                gh/etaf/161/base        -> origin/gh/etaf/161/base
2025-12-04T10:32:19.3915366Z  * [new branch]                gh/etaf/161/head        -> origin/gh/etaf/161/head
2025-12-04T10:32:19.3915481Z  * [new branch]                gh/etaf/161/orig        -> origin/gh/etaf/161/orig
2025-12-04T10:32:19.3915547Z  * [new branch]                gh/etaf/166/base        -> origin/gh/etaf/166/base
2025-12-04T10:32:19.3915612Z  * [new branch]                gh/etaf/166/head        -> origin/gh/etaf/166/head
2025-12-04T10:32:19.3915682Z  * [new branch]                gh/etaf/166/orig        -> origin/gh/etaf/166/orig
2025-12-04T10:32:19.3915749Z  * [new branch]                gh/etaf/167/base        -> origin/gh/etaf/167/base
2025-12-04T10:32:19.3915815Z  * [new branch]                gh/etaf/167/head        -> origin/gh/etaf/167/head
2025-12-04T10:32:19.3915885Z  * [new branch]                gh/etaf/167/orig        -> origin/gh/etaf/167/orig
2025-12-04T10:32:19.3915951Z  * [new branch]                gh/etaf/168/base        -> origin/gh/etaf/168/base
2025-12-04T10:32:19.3916018Z  * [new branch]                gh/etaf/168/head        -> origin/gh/etaf/168/head
2025-12-04T10:32:19.3916092Z  * [new branch]                gh/etaf/168/orig        -> origin/gh/etaf/168/orig
2025-12-04T10:32:19.3916158Z  * [new branch]                gh/etaf/172/base        -> origin/gh/etaf/172/base
2025-12-04T10:32:19.3916225Z  * [new branch]                gh/etaf/172/head        -> origin/gh/etaf/172/head
2025-12-04T10:32:19.3916296Z  * [new branch]                gh/etaf/172/orig        -> origin/gh/etaf/172/orig
2025-12-04T10:32:19.3916399Z  * [new branch]                gh/etaf/173/base        -> origin/gh/etaf/173/base
2025-12-04T10:32:19.3916466Z  * [new branch]                gh/etaf/173/head        -> origin/gh/etaf/173/head
2025-12-04T10:32:19.3916537Z  * [new branch]                gh/etaf/173/orig        -> origin/gh/etaf/173/orig
2025-12-04T10:32:19.3916603Z  * [new branch]                gh/etaf/174/base        -> origin/gh/etaf/174/base
2025-12-04T10:32:19.3916670Z  * [new branch]                gh/etaf/174/head        -> origin/gh/etaf/174/head
2025-12-04T10:32:19.3916745Z  * [new branch]                gh/etaf/175/base        -> origin/gh/etaf/175/base
2025-12-04T10:32:19.3916811Z  * [new branch]                gh/etaf/175/head        -> origin/gh/etaf/175/head
2025-12-04T10:32:19.3916879Z  * [new branch]                gh/etaf/175/orig        -> origin/gh/etaf/175/orig
2025-12-04T10:32:19.3916949Z  * [new branch]                gh/etaf/176/base        -> origin/gh/etaf/176/base
2025-12-04T10:32:19.3917018Z  * [new branch]                gh/etaf/176/head        -> origin/gh/etaf/176/head
2025-12-04T10:32:19.3917090Z  * [new branch]                gh/etaf/176/orig        -> origin/gh/etaf/176/orig
2025-12-04T10:32:19.3917157Z  * [new branch]                gh/etaf/177/base        -> origin/gh/etaf/177/base
2025-12-04T10:32:19.3917224Z  * [new branch]                gh/etaf/177/head        -> origin/gh/etaf/177/head
2025-12-04T10:32:19.3917295Z  * [new branch]                gh/etaf/177/orig        -> origin/gh/etaf/177/orig
2025-12-04T10:32:19.3917362Z  * [new branch]                gh/etaf/178/base        -> origin/gh/etaf/178/base
2025-12-04T10:32:19.3917431Z  * [new branch]                gh/etaf/178/head        -> origin/gh/etaf/178/head
2025-12-04T10:32:19.3917502Z  * [new branch]                gh/etaf/178/orig        -> origin/gh/etaf/178/orig
2025-12-04T10:32:19.3917569Z  * [new branch]                gh/etaf/179/base        -> origin/gh/etaf/179/base
2025-12-04T10:32:19.3917637Z  * [new branch]                gh/etaf/179/head        -> origin/gh/etaf/179/head
2025-12-04T10:32:19.3917708Z  * [new branch]                gh/etaf/179/orig        -> origin/gh/etaf/179/orig
2025-12-04T10:32:19.3917774Z  * [new branch]                gh/etaf/180/base        -> origin/gh/etaf/180/base
2025-12-04T10:32:19.3917841Z  * [new branch]                gh/etaf/180/head        -> origin/gh/etaf/180/head
2025-12-04T10:32:19.3917912Z  * [new branch]                gh/etaf/180/orig        -> origin/gh/etaf/180/orig
2025-12-04T10:32:19.3917994Z  * [new branch]                gh/exclamaforte/1/base  -> origin/gh/exclamaforte/1/base
2025-12-04T10:32:19.3918107Z  * [new branch]                gh/exclamaforte/1/head  -> origin/gh/exclamaforte/1/head
2025-12-04T10:32:19.3918190Z  * [new branch]                gh/exclamaforte/2/base  -> origin/gh/exclamaforte/2/base
2025-12-04T10:32:19.3918268Z  * [new branch]                gh/exclamaforte/2/head  -> origin/gh/exclamaforte/2/head
2025-12-04T10:32:19.3918346Z  * [new branch]                gh/exclamaforte/3/base  -> origin/gh/exclamaforte/3/base
2025-12-04T10:32:19.3918427Z  * [new branch]                gh/exclamaforte/3/head  -> origin/gh/exclamaforte/3/head
2025-12-04T10:32:19.3918504Z  * [new branch]                gh/exclamaforte/4/base  -> origin/gh/exclamaforte/4/base
2025-12-04T10:32:19.3918580Z  * [new branch]                gh/exclamaforte/4/head  -> origin/gh/exclamaforte/4/head
2025-12-04T10:32:19.3918657Z  * [new branch]                gh/ezyang/2374/base     -> origin/gh/ezyang/2374/base
2025-12-04T10:32:19.3918729Z  * [new branch]                gh/ezyang/2374/head     -> origin/gh/ezyang/2374/head
2025-12-04T10:32:19.3918807Z  * [new branch]                gh/ezyang/2374/orig     -> origin/gh/ezyang/2374/orig
2025-12-04T10:32:19.3918879Z  * [new branch]                gh/ezyang/2973/base     -> origin/gh/ezyang/2973/base
2025-12-04T10:32:19.3918950Z  * [new branch]                gh/ezyang/2973/head     -> origin/gh/ezyang/2973/head
2025-12-04T10:32:19.3919052Z  * [new branch]                gh/ezyang/2973/orig     -> origin/gh/ezyang/2973/orig
2025-12-04T10:32:19.3919123Z  * [new branch]                gh/ezyang/2974/base     -> origin/gh/ezyang/2974/base
2025-12-04T10:32:19.3919194Z  * [new branch]                gh/ezyang/2974/head     -> origin/gh/ezyang/2974/head
2025-12-04T10:32:19.3919270Z  * [new branch]                gh/ezyang/2974/orig     -> origin/gh/ezyang/2974/orig
2025-12-04T10:32:19.3919340Z  * [new branch]                gh/ezyang/3131/base     -> origin/gh/ezyang/3131/base
2025-12-04T10:32:19.3919409Z  * [new branch]                gh/ezyang/3131/head     -> origin/gh/ezyang/3131/head
2025-12-04T10:32:19.3919485Z  * [new branch]                gh/ezyang/3131/orig     -> origin/gh/ezyang/3131/orig
2025-12-04T10:32:19.3919555Z  * [new branch]                gh/ezyang/3139/base     -> origin/gh/ezyang/3139/base
2025-12-04T10:32:19.3919666Z  * [new branch]                gh/ezyang/3139/head     -> origin/gh/ezyang/3139/head
2025-12-04T10:32:19.3919739Z  * [new branch]                gh/ezyang/3139/orig     -> origin/gh/ezyang/3139/orig
2025-12-04T10:32:19.3919809Z  * [new branch]                gh/ezyang/3140/base     -> origin/gh/ezyang/3140/base
2025-12-04T10:32:19.3919878Z  * [new branch]                gh/ezyang/3140/head     -> origin/gh/ezyang/3140/head
2025-12-04T10:32:19.3919953Z  * [new branch]                gh/ezyang/3140/orig     -> origin/gh/ezyang/3140/orig
2025-12-04T10:32:19.3920022Z  * [new branch]                gh/ezyang/3143/base     -> origin/gh/ezyang/3143/base
2025-12-04T10:32:19.3920091Z  * [new branch]                gh/ezyang/3143/head     -> origin/gh/ezyang/3143/head
2025-12-04T10:32:19.3920166Z  * [new branch]                gh/ezyang/3143/orig     -> origin/gh/ezyang/3143/orig
2025-12-04T10:32:19.3920237Z  * [new branch]                gh/ezyang/3144/base     -> origin/gh/ezyang/3144/base
2025-12-04T10:32:19.3920305Z  * [new branch]                gh/ezyang/3144/head     -> origin/gh/ezyang/3144/head
2025-12-04T10:32:19.3920379Z  * [new branch]                gh/ezyang/3144/orig     -> origin/gh/ezyang/3144/orig
2025-12-04T10:32:19.3920449Z  * [new branch]                gh/ezyang/3167/base     -> origin/gh/ezyang/3167/base
2025-12-04T10:32:19.3920518Z  * [new branch]                gh/ezyang/3167/head     -> origin/gh/ezyang/3167/head
2025-12-04T10:32:19.3920592Z  * [new branch]                gh/ezyang/3167/orig     -> origin/gh/ezyang/3167/orig
2025-12-04T10:32:19.3920660Z  * [new branch]                gh/ezyang/3173/base     -> origin/gh/ezyang/3173/base
2025-12-04T10:32:19.3920732Z  * [new branch]                gh/ezyang/3173/head     -> origin/gh/ezyang/3173/head
2025-12-04T10:32:19.3920852Z  * [new branch]                gh/ezyang/3173/orig     -> origin/gh/ezyang/3173/orig
2025-12-04T10:32:19.3920922Z  * [new branch]                gh/ezyang/3175/base     -> origin/gh/ezyang/3175/base
2025-12-04T10:32:19.3920995Z  * [new branch]                gh/ezyang/3175/head     -> origin/gh/ezyang/3175/head
2025-12-04T10:32:19.3921064Z  * [new branch]                gh/ezyang/3175/orig     -> origin/gh/ezyang/3175/orig
2025-12-04T10:32:19.3921133Z  * [new branch]                gh/ezyang/3182/base     -> origin/gh/ezyang/3182/base
2025-12-04T10:32:19.3921206Z  * [new branch]                gh/ezyang/3182/head     -> origin/gh/ezyang/3182/head
2025-12-04T10:32:19.3921275Z  * [new branch]                gh/ezyang/3182/orig     -> origin/gh/ezyang/3182/orig
2025-12-04T10:32:19.3921345Z  * [new branch]                gh/ezyang/3185/base     -> origin/gh/ezyang/3185/base
2025-12-04T10:32:19.3921417Z  * [new branch]                gh/ezyang/3185/head     -> origin/gh/ezyang/3185/head
2025-12-04T10:32:19.3921488Z  * [new branch]                gh/ezyang/3185/orig     -> origin/gh/ezyang/3185/orig
2025-12-04T10:32:19.3921557Z  * [new branch]                gh/ezyang/3189/base     -> origin/gh/ezyang/3189/base
2025-12-04T10:32:19.3921631Z  * [new branch]                gh/ezyang/3189/head     -> origin/gh/ezyang/3189/head
2025-12-04T10:32:19.3921751Z  * [new branch]                gh/ezyang/3189/orig     -> origin/gh/ezyang/3189/orig
2025-12-04T10:32:19.3921821Z  * [new branch]                gh/ezyang/3191/base     -> origin/gh/ezyang/3191/base
2025-12-04T10:32:19.3921894Z  * [new branch]                gh/ezyang/3191/head     -> origin/gh/ezyang/3191/head
2025-12-04T10:32:19.3921964Z  * [new branch]                gh/ezyang/3191/orig     -> origin/gh/ezyang/3191/orig
2025-12-04T10:32:19.3922033Z  * [new branch]                gh/ezyang/3192/base     -> origin/gh/ezyang/3192/base
2025-12-04T10:32:19.3922107Z  * [new branch]                gh/ezyang/3192/head     -> origin/gh/ezyang/3192/head
2025-12-04T10:32:19.3922177Z  * [new branch]                gh/ezyang/3192/orig     -> origin/gh/ezyang/3192/orig
2025-12-04T10:32:19.3922247Z  * [new branch]                gh/ezyang/3193/base     -> origin/gh/ezyang/3193/base
2025-12-04T10:32:19.3922322Z  * [new branch]                gh/ezyang/3193/head     -> origin/gh/ezyang/3193/head
2025-12-04T10:32:19.3922390Z  * [new branch]                gh/ezyang/3193/orig     -> origin/gh/ezyang/3193/orig
2025-12-04T10:32:19.3922464Z  * [new branch]                gh/ezyang/3194/base     -> origin/gh/ezyang/3194/base
2025-12-04T10:32:19.3922534Z  * [new branch]                gh/ezyang/3194/head     -> origin/gh/ezyang/3194/head
2025-12-04T10:32:19.3922602Z  * [new branch]                gh/ezyang/3194/orig     -> origin/gh/ezyang/3194/orig
2025-12-04T10:32:19.3922676Z  * [new branch]                gh/ezyang/3195/base     -> origin/gh/ezyang/3195/base
2025-12-04T10:32:19.3922746Z  * [new branch]                gh/ezyang/3195/head     -> origin/gh/ezyang/3195/head
2025-12-04T10:32:19.3922817Z  * [new branch]                gh/ezyang/3195/orig     -> origin/gh/ezyang/3195/orig
2025-12-04T10:32:19.3922890Z  * [new branch]                gh/ezyang/3196/base     -> origin/gh/ezyang/3196/base
2025-12-04T10:32:19.3922960Z  * [new branch]                gh/ezyang/3196/head     -> origin/gh/ezyang/3196/head
2025-12-04T10:32:19.3923030Z  * [new branch]                gh/ezyang/3196/orig     -> origin/gh/ezyang/3196/orig
2025-12-04T10:32:19.3923102Z  * [new branch]                gh/ezyang/3197/base     -> origin/gh/ezyang/3197/base
2025-12-04T10:32:19.3923172Z  * [new branch]                gh/ezyang/3197/head     -> origin/gh/ezyang/3197/head
2025-12-04T10:32:19.3923242Z  * [new branch]                gh/ezyang/3197/orig     -> origin/gh/ezyang/3197/orig
2025-12-04T10:32:19.3923314Z  * [new branch]                gh/ezyang/3198/base     -> origin/gh/ezyang/3198/base
2025-12-04T10:32:19.3933200Z  * [new branch]                gh/ezyang/3198/head     -> origin/gh/ezyang/3198/head
2025-12-04T10:32:19.3933372Z  * [new branch]                gh/ezyang/3198/orig     -> origin/gh/ezyang/3198/orig
2025-12-04T10:32:19.3933455Z  * [new branch]                gh/ezyang/3199/base     -> origin/gh/ezyang/3199/base
2025-12-04T10:32:19.3933528Z  * [new branch]                gh/ezyang/3199/head     -> origin/gh/ezyang/3199/head
2025-12-04T10:32:19.3933602Z  * [new branch]                gh/ezyang/3199/orig     -> origin/gh/ezyang/3199/orig
2025-12-04T10:32:19.3933683Z  * [new branch]                gh/ezyang/3200/base     -> origin/gh/ezyang/3200/base
2025-12-04T10:32:19.3933756Z  * [new branch]                gh/ezyang/3200/head     -> origin/gh/ezyang/3200/head
2025-12-04T10:32:19.3933827Z  * [new branch]                gh/ezyang/3200/orig     -> origin/gh/ezyang/3200/orig
2025-12-04T10:32:19.3933901Z  * [new branch]                gh/ezyang/3201/base     -> origin/gh/ezyang/3201/base
2025-12-04T10:32:19.3933974Z  * [new branch]                gh/ezyang/3201/head     -> origin/gh/ezyang/3201/head
2025-12-04T10:32:19.3934045Z  * [new branch]                gh/ezyang/3201/orig     -> origin/gh/ezyang/3201/orig
2025-12-04T10:32:19.3934117Z  * [new branch]                gh/ezyang/3202/base     -> origin/gh/ezyang/3202/base
2025-12-04T10:32:19.3934191Z  * [new branch]                gh/ezyang/3202/head     -> origin/gh/ezyang/3202/head
2025-12-04T10:32:19.3934318Z  * [new branch]                gh/ezyang/3202/orig     -> origin/gh/ezyang/3202/orig
2025-12-04T10:32:19.3934392Z  * [new branch]                gh/ezyang/3203/base     -> origin/gh/ezyang/3203/base
2025-12-04T10:32:19.3934462Z  * [new branch]                gh/ezyang/3203/head     -> origin/gh/ezyang/3203/head
2025-12-04T10:32:19.3934530Z  * [new branch]                gh/ezyang/3203/orig     -> origin/gh/ezyang/3203/orig
2025-12-04T10:32:19.3934598Z  * [new branch]                gh/ezyang/3204/base     -> origin/gh/ezyang/3204/base
2025-12-04T10:32:19.3934669Z  * [new branch]                gh/ezyang/3204/head     -> origin/gh/ezyang/3204/head
2025-12-04T10:32:19.3934737Z  * [new branch]                gh/ezyang/3204/orig     -> origin/gh/ezyang/3204/orig
2025-12-04T10:32:19.3934804Z  * [new branch]                gh/ezyang/3205/base     -> origin/gh/ezyang/3205/base
2025-12-04T10:32:19.3934875Z  * [new branch]                gh/ezyang/3205/head     -> origin/gh/ezyang/3205/head
2025-12-04T10:32:19.3934945Z  * [new branch]                gh/ezyang/3205/orig     -> origin/gh/ezyang/3205/orig
2025-12-04T10:32:19.3935014Z  * [new branch]                gh/ezyang/3206/base     -> origin/gh/ezyang/3206/base
2025-12-04T10:32:19.3935086Z  * [new branch]                gh/ezyang/3206/head     -> origin/gh/ezyang/3206/head
2025-12-04T10:32:19.3935160Z  * [new branch]                gh/ezyang/3206/orig     -> origin/gh/ezyang/3206/orig
2025-12-04T10:32:19.3935229Z  * [new branch]                gh/ezyang/3207/base     -> origin/gh/ezyang/3207/base
2025-12-04T10:32:19.3935303Z  * [new branch]                gh/ezyang/3207/head     -> origin/gh/ezyang/3207/head
2025-12-04T10:32:19.3935381Z  * [new branch]                gh/ezyang/3207/orig     -> origin/gh/ezyang/3207/orig
2025-12-04T10:32:19.3935451Z  * [new branch]                gh/ezyang/3208/base     -> origin/gh/ezyang/3208/base
2025-12-04T10:32:19.3935521Z  * [new branch]                gh/ezyang/3208/head     -> origin/gh/ezyang/3208/head
2025-12-04T10:32:19.3935600Z  * [new branch]                gh/ezyang/3208/orig     -> origin/gh/ezyang/3208/orig
2025-12-04T10:32:19.3935669Z  * [new branch]                gh/ezyang/3209/base     -> origin/gh/ezyang/3209/base
2025-12-04T10:32:19.3935738Z  * [new branch]                gh/ezyang/3209/head     -> origin/gh/ezyang/3209/head
2025-12-04T10:32:19.3935809Z  * [new branch]                gh/ezyang/3209/orig     -> origin/gh/ezyang/3209/orig
2025-12-04T10:32:19.3935883Z  * [new branch]                gh/fadara01/3/base      -> origin/gh/fadara01/3/base
2025-12-04T10:32:19.3936003Z  * [new branch]                gh/fadara01/3/head      -> origin/gh/fadara01/3/head
2025-12-04T10:32:19.3936073Z  * [new branch]                gh/fadara01/3/orig      -> origin/gh/fadara01/3/orig
2025-12-04T10:32:19.3936143Z  * [new branch]                gh/fadara01/5/base      -> origin/gh/fadara01/5/base
2025-12-04T10:32:19.3936216Z  * [new branch]                gh/fadara01/5/head      -> origin/gh/fadara01/5/head
2025-12-04T10:32:19.3936289Z  * [new branch]                gh/fadara01/5/orig      -> origin/gh/fadara01/5/orig
2025-12-04T10:32:19.3936359Z  * [new branch]                gh/fadara01/6/base      -> origin/gh/fadara01/6/base
2025-12-04T10:32:19.3936433Z  * [new branch]                gh/fadara01/6/head      -> origin/gh/fadara01/6/head
2025-12-04T10:32:19.3936503Z  * [new branch]                gh/fadara01/6/orig      -> origin/gh/fadara01/6/orig
2025-12-04T10:32:19.3936573Z  * [new branch]                gh/fadara01/7/base      -> origin/gh/fadara01/7/base
2025-12-04T10:32:19.3936647Z  * [new branch]                gh/fadara01/7/head      -> origin/gh/fadara01/7/head
2025-12-04T10:32:19.3936719Z  * [new branch]                gh/fadara01/7/orig      -> origin/gh/fadara01/7/orig
2025-12-04T10:32:19.3936790Z  * [new branch]                gh/fadara01/8/base      -> origin/gh/fadara01/8/base
2025-12-04T10:32:19.3936863Z  * [new branch]                gh/fadara01/8/head      -> origin/gh/fadara01/8/head
2025-12-04T10:32:19.3936962Z  * [new branch]                gh/fadara01/8/orig      -> origin/gh/fadara01/8/orig
2025-12-04T10:32:19.3937030Z  * [new branch]                gh/fadara01/9/base      -> origin/gh/fadara01/9/base
2025-12-04T10:32:19.3937102Z  * [new branch]                gh/fadara01/9/head      -> origin/gh/fadara01/9/head
2025-12-04T10:32:19.3937170Z  * [new branch]                gh/fadara01/9/orig      -> origin/gh/fadara01/9/orig
2025-12-04T10:32:19.3937239Z  * [new branch]                gh/fduwjj/182/base      -> origin/gh/fduwjj/182/base
2025-12-04T10:32:19.3937312Z  * [new branch]                gh/fduwjj/182/head      -> origin/gh/fduwjj/182/head
2025-12-04T10:32:19.3937386Z  * [new branch]                gh/fduwjj/182/orig      -> origin/gh/fduwjj/182/orig
2025-12-04T10:32:19.3937455Z  * [new branch]                gh/fduwjj/211/base      -> origin/gh/fduwjj/211/base
2025-12-04T10:32:19.3937527Z  * [new branch]                gh/fduwjj/211/head      -> origin/gh/fduwjj/211/head
2025-12-04T10:32:19.3937599Z  * [new branch]                gh/fduwjj/211/orig      -> origin/gh/fduwjj/211/orig
2025-12-04T10:32:19.3937669Z  * [new branch]                gh/fduwjj/212/base      -> origin/gh/fduwjj/212/base
2025-12-04T10:32:19.3937744Z  * [new branch]                gh/fduwjj/212/head      -> origin/gh/fduwjj/212/head
2025-12-04T10:32:19.3937812Z  * [new branch]                gh/fduwjj/212/orig      -> origin/gh/fduwjj/212/orig
2025-12-04T10:32:19.3937882Z  * [new branch]                gh/fduwjj/213/base      -> origin/gh/fduwjj/213/base
2025-12-04T10:32:19.3937951Z  * [new branch]                gh/fduwjj/213/head      -> origin/gh/fduwjj/213/head
2025-12-04T10:32:19.3938022Z  * [new branch]                gh/fduwjj/213/orig      -> origin/gh/fduwjj/213/orig
2025-12-04T10:32:19.3938093Z  * [new branch]                gh/fduwjj/226/base      -> origin/gh/fduwjj/226/base
2025-12-04T10:32:19.3938160Z  * [new branch]                gh/fduwjj/226/head      -> origin/gh/fduwjj/226/head
2025-12-04T10:32:19.3938228Z  * [new branch]                gh/fduwjj/226/orig      -> origin/gh/fduwjj/226/orig
2025-12-04T10:32:19.3938302Z  * [new branch]                gh/fduwjj/229/base      -> origin/gh/fduwjj/229/base
2025-12-04T10:32:19.3938372Z  * [new branch]                gh/fduwjj/229/head      -> origin/gh/fduwjj/229/head
2025-12-04T10:32:19.3938440Z  * [new branch]                gh/fduwjj/229/orig      -> origin/gh/fduwjj/229/orig
2025-12-04T10:32:19.3938508Z  * [new branch]                gh/fduwjj/233/base      -> origin/gh/fduwjj/233/base
2025-12-04T10:32:19.3938577Z  * [new branch]                gh/fduwjj/233/head      -> origin/gh/fduwjj/233/head
2025-12-04T10:32:19.3938677Z  * [new branch]                gh/fduwjj/233/orig      -> origin/gh/fduwjj/233/orig
2025-12-04T10:32:19.3938747Z  * [new branch]                gh/fduwjj/234/base      -> origin/gh/fduwjj/234/base
2025-12-04T10:32:19.3938813Z  * [new branch]                gh/fduwjj/234/head      -> origin/gh/fduwjj/234/head
2025-12-04T10:32:19.3938882Z  * [new branch]                gh/fduwjj/234/orig      -> origin/gh/fduwjj/234/orig
2025-12-04T10:32:19.3938952Z  * [new branch]                gh/fduwjj/235/base      -> origin/gh/fduwjj/235/base
2025-12-04T10:32:19.3939019Z  * [new branch]                gh/fduwjj/235/head      -> origin/gh/fduwjj/235/head
2025-12-04T10:32:19.3939086Z  * [new branch]                gh/fduwjj/235/orig      -> origin/gh/fduwjj/235/orig
2025-12-04T10:32:19.3939154Z  * [new branch]                gh/fduwjj/236/base      -> origin/gh/fduwjj/236/base
2025-12-04T10:32:19.3939220Z  * [new branch]                gh/fduwjj/236/head      -> origin/gh/fduwjj/236/head
2025-12-04T10:32:19.3939288Z  * [new branch]                gh/fduwjj/236/orig      -> origin/gh/fduwjj/236/orig
2025-12-04T10:32:19.3939358Z  * [new branch]                gh/fduwjj/237/base      -> origin/gh/fduwjj/237/base
2025-12-04T10:32:19.3939426Z  * [new branch]                gh/fduwjj/237/head      -> origin/gh/fduwjj/237/head
2025-12-04T10:32:19.3940040Z  * [new branch]                gh/fduwjj/237/orig      -> origin/gh/fduwjj/237/orig
2025-12-04T10:32:19.3940115Z  * [new branch]                gh/fduwjj/238/base      -> origin/gh/fduwjj/238/base
2025-12-04T10:32:19.3940184Z  * [new branch]                gh/fduwjj/238/head      -> origin/gh/fduwjj/238/head
2025-12-04T10:32:19.3940256Z  * [new branch]                gh/fduwjj/238/orig      -> origin/gh/fduwjj/238/orig
2025-12-04T10:32:19.3940323Z  * [new branch]                gh/fduwjj/239/base      -> origin/gh/fduwjj/239/base
2025-12-04T10:32:19.3940390Z  * [new branch]                gh/fduwjj/239/head      -> origin/gh/fduwjj/239/head
2025-12-04T10:32:19.3940462Z  * [new branch]                gh/fduwjj/239/orig      -> origin/gh/fduwjj/239/orig
2025-12-04T10:32:19.3940534Z  * [new branch]                gh/fegin/332/base       -> origin/gh/fegin/332/base
2025-12-04T10:32:19.3940602Z  * [new branch]                gh/fegin/332/head       -> origin/gh/fegin/332/head
2025-12-04T10:32:19.3940670Z  * [new branch]                gh/fegin/332/orig       -> origin/gh/fegin/332/orig
2025-12-04T10:32:19.3940737Z  * [new branch]                gh/fegin/333/base       -> origin/gh/fegin/333/base
2025-12-04T10:32:19.3940803Z  * [new branch]                gh/fegin/333/head       -> origin/gh/fegin/333/head
2025-12-04T10:32:19.3940871Z  * [new branch]                gh/fegin/333/orig       -> origin/gh/fegin/333/orig
2025-12-04T10:32:19.3940938Z  * [new branch]                gh/fegin/334/base       -> origin/gh/fegin/334/base
2025-12-04T10:32:19.3941003Z  * [new branch]                gh/fegin/334/head       -> origin/gh/fegin/334/head
2025-12-04T10:32:19.3941077Z  * [new branch]                gh/fegin/334/orig       -> origin/gh/fegin/334/orig
2025-12-04T10:32:19.3941143Z  * [new branch]                gh/fegin/335/base       -> origin/gh/fegin/335/base
2025-12-04T10:32:19.3941209Z  * [new branch]                gh/fegin/335/head       -> origin/gh/fegin/335/head
2025-12-04T10:32:19.3941279Z  * [new branch]                gh/fegin/335/orig       -> origin/gh/fegin/335/orig
2025-12-04T10:32:19.3941349Z  * [new branch]                gh/fffrog/160/base      -> origin/gh/fffrog/160/base
2025-12-04T10:32:19.3941418Z  * [new branch]                gh/fffrog/160/head      -> origin/gh/fffrog/160/head
2025-12-04T10:32:19.3941486Z  * [new branch]                gh/fffrog/177/base      -> origin/gh/fffrog/177/base
2025-12-04T10:32:19.3941553Z  * [new branch]                gh/fffrog/177/head      -> origin/gh/fffrog/177/head
2025-12-04T10:32:19.3941620Z  * [new branch]                gh/fffrog/177/orig      -> origin/gh/fffrog/177/orig
2025-12-04T10:32:19.3941731Z  * [new branch]                gh/fffrog/178/base      -> origin/gh/fffrog/178/base
2025-12-04T10:32:19.3941799Z  * [new branch]                gh/fffrog/178/head      -> origin/gh/fffrog/178/head
2025-12-04T10:32:19.3941868Z  * [new branch]                gh/fffrog/178/orig      -> origin/gh/fffrog/178/orig
2025-12-04T10:32:19.3941935Z  * [new branch]                gh/fffrog/181/base      -> origin/gh/fffrog/181/base
2025-12-04T10:32:19.3942003Z  * [new branch]                gh/fffrog/181/head      -> origin/gh/fffrog/181/head
2025-12-04T10:32:19.3942076Z  * [new branch]                gh/fffrog/181/orig      -> origin/gh/fffrog/181/orig
2025-12-04T10:32:19.3942142Z  * [new branch]                gh/fffrog/183/base      -> origin/gh/fffrog/183/base
2025-12-04T10:32:19.3942210Z  * [new branch]                gh/fffrog/183/head      -> origin/gh/fffrog/183/head
2025-12-04T10:32:19.3942280Z  * [new branch]                gh/fffrog/183/orig      -> origin/gh/fffrog/183/orig
2025-12-04T10:32:19.3942350Z  * [new branch]                gh/fxdawnn/10/base      -> origin/gh/fxdawnn/10/base
2025-12-04T10:32:19.3942419Z  * [new branch]                gh/fxdawnn/10/head      -> origin/gh/fxdawnn/10/head
2025-12-04T10:32:19.3942489Z  * [new branch]                gh/fxdawnn/10/orig      -> origin/gh/fxdawnn/10/orig
2025-12-04T10:32:19.3942557Z  * [new branch]                gh/fxdawnn/11/base      -> origin/gh/fxdawnn/11/base
2025-12-04T10:32:19.3942660Z  * [new branch]                gh/fxdawnn/11/head      -> origin/gh/fxdawnn/11/head
2025-12-04T10:32:19.3942732Z  * [new branch]                gh/fxdawnn/11/orig      -> origin/gh/fxdawnn/11/orig
2025-12-04T10:32:19.3942800Z  * [new branch]                gh/fxdawnn/12/base      -> origin/gh/fxdawnn/12/base
2025-12-04T10:32:19.3942867Z  * [new branch]                gh/fxdawnn/12/head      -> origin/gh/fxdawnn/12/head
2025-12-04T10:32:19.3942938Z  * [new branch]                gh/fxdawnn/12/orig      -> origin/gh/fxdawnn/12/orig
2025-12-04T10:32:19.3943008Z  * [new branch]                gh/fxdawnn/13/base      -> origin/gh/fxdawnn/13/base
2025-12-04T10:32:19.3943075Z  * [new branch]                gh/fxdawnn/13/head      -> origin/gh/fxdawnn/13/head
2025-12-04T10:32:19.3943144Z  * [new branch]                gh/fxdawnn/13/orig      -> origin/gh/fxdawnn/13/orig
2025-12-04T10:32:19.3943212Z  * [new branch]                gh/fxdawnn/14/base      -> origin/gh/fxdawnn/14/base
2025-12-04T10:32:19.3943281Z  * [new branch]                gh/fxdawnn/14/head      -> origin/gh/fxdawnn/14/head
2025-12-04T10:32:19.3943350Z  * [new branch]                gh/fxdawnn/14/orig      -> origin/gh/fxdawnn/14/orig
2025-12-04T10:32:19.3943417Z  * [new branch]                gh/fxdawnn/15/base      -> origin/gh/fxdawnn/15/base
2025-12-04T10:32:19.3943484Z  * [new branch]                gh/fxdawnn/15/head      -> origin/gh/fxdawnn/15/head
2025-12-04T10:32:19.3943552Z  * [new branch]                gh/fxdawnn/15/orig      -> origin/gh/fxdawnn/15/orig
2025-12-04T10:32:19.3943624Z  * [new branch]                gh/fxdawnn/6/base       -> origin/gh/fxdawnn/6/base
2025-12-04T10:32:19.3943694Z  * [new branch]                gh/fxdawnn/6/head       -> origin/gh/fxdawnn/6/head
2025-12-04T10:32:19.3943761Z  * [new branch]                gh/fxdawnn/6/orig       -> origin/gh/fxdawnn/6/orig
2025-12-04T10:32:19.3943828Z  * [new branch]                gh/fxdawnn/7/base       -> origin/gh/fxdawnn/7/base
2025-12-04T10:32:19.3943897Z  * [new branch]                gh/fxdawnn/7/head       -> origin/gh/fxdawnn/7/head
2025-12-04T10:32:19.3943962Z  * [new branch]                gh/fxdawnn/7/orig       -> origin/gh/fxdawnn/7/orig
2025-12-04T10:32:19.3944028Z  * [new branch]                gh/fxdawnn/9/base       -> origin/gh/fxdawnn/9/base
2025-12-04T10:32:19.3944095Z  * [new branch]                gh/fxdawnn/9/head       -> origin/gh/fxdawnn/9/head
2025-12-04T10:32:19.3944160Z  * [new branch]                gh/fxdawnn/9/orig       -> origin/gh/fxdawnn/9/orig
2025-12-04T10:32:19.3944261Z  * [new branch]                gh/galv/1/base          -> origin/gh/galv/1/base
2025-12-04T10:32:19.3944329Z  * [new branch]                gh/galv/1/head          -> origin/gh/galv/1/head
2025-12-04T10:32:19.3944393Z  * [new branch]                gh/galv/1/orig          -> origin/gh/galv/1/orig
2025-12-04T10:32:19.3944455Z  * [new branch]                gh/galv/2/base          -> origin/gh/galv/2/base
2025-12-04T10:32:19.3944521Z  * [new branch]                gh/galv/2/head          -> origin/gh/galv/2/head
2025-12-04T10:32:19.3944584Z  * [new branch]                gh/galv/2/orig          -> origin/gh/galv/2/orig
2025-12-04T10:32:19.3944648Z  * [new branch]                gh/galv/3/base          -> origin/gh/galv/3/base
2025-12-04T10:32:19.3944711Z  * [new branch]                gh/galv/3/head          -> origin/gh/galv/3/head
2025-12-04T10:32:19.3944775Z  * [new branch]                gh/galv/3/orig          -> origin/gh/galv/3/orig
2025-12-04T10:32:19.3944854Z  * [new branch]                gh/guangyey/134/base    -> origin/gh/guangyey/134/base
2025-12-04T10:32:19.3944932Z  * [new branch]                gh/guangyey/134/head    -> origin/gh/guangyey/134/head
2025-12-04T10:32:19.3945003Z  * [new branch]                gh/guangyey/134/orig    -> origin/gh/guangyey/134/orig
2025-12-04T10:32:19.3945075Z  * [new branch]                gh/guangyey/163/base    -> origin/gh/guangyey/163/base
2025-12-04T10:32:19.3945187Z  * [new branch]                gh/guangyey/163/head    -> origin/gh/guangyey/163/head
2025-12-04T10:32:19.3945258Z  * [new branch]                gh/guangyey/163/orig    -> origin/gh/guangyey/163/orig
2025-12-04T10:32:19.3945330Z  * [new branch]                gh/guangyey/168/base    -> origin/gh/guangyey/168/base
2025-12-04T10:32:19.3945402Z  * [new branch]                gh/guangyey/168/head    -> origin/gh/guangyey/168/head
2025-12-04T10:32:19.3945473Z  * [new branch]                gh/guangyey/168/orig    -> origin/gh/guangyey/168/orig
2025-12-04T10:32:19.3945548Z  * [new branch]                gh/guangyey/169/base    -> origin/gh/guangyey/169/base
2025-12-04T10:32:19.3945625Z  * [new branch]                gh/guangyey/169/head    -> origin/gh/guangyey/169/head
2025-12-04T10:32:19.3945694Z  * [new branch]                gh/guangyey/169/orig    -> origin/gh/guangyey/169/orig
2025-12-04T10:32:19.3945768Z  * [new branch]                gh/guangyey/170/base    -> origin/gh/guangyey/170/base
2025-12-04T10:32:19.3945840Z  * [new branch]                gh/guangyey/170/head    -> origin/gh/guangyey/170/head
2025-12-04T10:32:19.3945911Z  * [new branch]                gh/guangyey/170/orig    -> origin/gh/guangyey/170/orig
2025-12-04T10:32:19.3945984Z  * [new branch]                gh/guangyey/171/base    -> origin/gh/guangyey/171/base
2025-12-04T10:32:19.3946055Z  * [new branch]                gh/guangyey/171/head    -> origin/gh/guangyey/171/head
2025-12-04T10:32:19.3946126Z  * [new branch]                gh/guangyey/171/orig    -> origin/gh/guangyey/171/orig
2025-12-04T10:32:19.3946199Z  * [new branch]                gh/guangyey/178/base    -> origin/gh/guangyey/178/base
2025-12-04T10:32:19.3946272Z  * [new branch]                gh/guangyey/178/head    -> origin/gh/guangyey/178/head
2025-12-04T10:32:19.3946344Z  * [new branch]                gh/guangyey/178/orig    -> origin/gh/guangyey/178/orig
2025-12-04T10:32:19.3946418Z  * [new branch]                gh/guangyey/182/base    -> origin/gh/guangyey/182/base
2025-12-04T10:32:19.3946488Z  * [new branch]                gh/guangyey/182/head    -> origin/gh/guangyey/182/head
2025-12-04T10:32:19.3946559Z  * [new branch]                gh/guangyey/182/orig    -> origin/gh/guangyey/182/orig
2025-12-04T10:32:19.3946631Z  * [new branch]                gh/guangyey/183/base    -> origin/gh/guangyey/183/base
2025-12-04T10:32:19.3946702Z  * [new branch]                gh/guangyey/183/head    -> origin/gh/guangyey/183/head
2025-12-04T10:32:19.3946772Z  * [new branch]                gh/guangyey/183/orig    -> origin/gh/guangyey/183/orig
2025-12-04T10:32:19.3946910Z  * [new branch]                gh/guangyey/185/base    -> origin/gh/guangyey/185/base
2025-12-04T10:32:19.3946979Z  * [new branch]                gh/guangyey/185/head    -> origin/gh/guangyey/185/head
2025-12-04T10:32:19.3947051Z  * [new branch]                gh/guangyey/185/orig    -> origin/gh/guangyey/185/orig
2025-12-04T10:32:19.3947122Z  * [new branch]                gh/guangyey/186/base    -> origin/gh/guangyey/186/base
2025-12-04T10:32:19.3947194Z  * [new branch]                gh/guangyey/186/head    -> origin/gh/guangyey/186/head
2025-12-04T10:32:19.3947268Z  * [new branch]                gh/guangyey/186/orig    -> origin/gh/guangyey/186/orig
2025-12-04T10:32:19.3947340Z  * [new branch]                gh/guangyey/187/base    -> origin/gh/guangyey/187/base
2025-12-04T10:32:19.3947410Z  * [new branch]                gh/guangyey/187/head    -> origin/gh/guangyey/187/head
2025-12-04T10:32:19.3947483Z  * [new branch]                gh/guangyey/187/orig    -> origin/gh/guangyey/187/orig
2025-12-04T10:32:19.3947558Z  * [new branch]                gh/guangyey/188/base    -> origin/gh/guangyey/188/base
2025-12-04T10:32:19.3947628Z  * [new branch]                gh/guangyey/188/head    -> origin/gh/guangyey/188/head
2025-12-04T10:32:19.3947699Z  * [new branch]                gh/guangyey/188/orig    -> origin/gh/guangyey/188/orig
2025-12-04T10:32:19.3947770Z  * [new branch]                gh/guangyey/190/base    -> origin/gh/guangyey/190/base
2025-12-04T10:32:19.3947867Z  * [new branch]                gh/guangyey/190/head    -> origin/gh/guangyey/190/head
2025-12-04T10:32:19.3947941Z  * [new branch]                gh/guangyey/190/orig    -> origin/gh/guangyey/190/orig
2025-12-04T10:32:19.3948011Z  * [new branch]                gh/guangyey/208/base    -> origin/gh/guangyey/208/base
2025-12-04T10:32:19.3948082Z  * [new branch]                gh/guangyey/208/head    -> origin/gh/guangyey/208/head
2025-12-04T10:32:19.3948153Z  * [new branch]                gh/guangyey/208/orig    -> origin/gh/guangyey/208/orig
2025-12-04T10:32:19.3948225Z  * [new branch]                gh/guangyey/228/base    -> origin/gh/guangyey/228/base
2025-12-04T10:32:19.3948295Z  * [new branch]                gh/guangyey/228/head    -> origin/gh/guangyey/228/head
2025-12-04T10:32:19.3948368Z  * [new branch]                gh/guangyey/228/orig    -> origin/gh/guangyey/228/orig
2025-12-04T10:32:19.3948439Z  * [new branch]                gh/guangyey/230/base    -> origin/gh/guangyey/230/base
2025-12-04T10:32:19.3948512Z  * [new branch]                gh/guangyey/230/head    -> origin/gh/guangyey/230/head
2025-12-04T10:32:19.3948582Z  * [new branch]                gh/guangyey/230/orig    -> origin/gh/guangyey/230/orig
2025-12-04T10:32:19.3948653Z  * [new branch]                gh/guangyey/231/base    -> origin/gh/guangyey/231/base
2025-12-04T10:32:19.3948728Z  * [new branch]                gh/guangyey/231/head    -> origin/gh/guangyey/231/head
2025-12-04T10:32:19.3948799Z  * [new branch]                gh/guangyey/231/orig    -> origin/gh/guangyey/231/orig
2025-12-04T10:32:19.3948870Z  * [new branch]                gh/guangyey/232/base    -> origin/gh/guangyey/232/base
2025-12-04T10:32:19.3948942Z  * [new branch]                gh/guangyey/232/head    -> origin/gh/guangyey/232/head
2025-12-04T10:32:19.3949012Z  * [new branch]                gh/guangyey/232/orig    -> origin/gh/guangyey/232/orig
2025-12-04T10:32:19.3949081Z  * [new branch]                gh/guangyey/233/base    -> origin/gh/guangyey/233/base
2025-12-04T10:32:19.3949156Z  * [new branch]                gh/guangyey/233/head    -> origin/gh/guangyey/233/head
2025-12-04T10:32:19.3949226Z  * [new branch]                gh/guangyey/233/orig    -> origin/gh/guangyey/233/orig
2025-12-04T10:32:19.3949297Z  * [new branch]                gh/guangyey/234/base    -> origin/gh/guangyey/234/base
2025-12-04T10:32:19.3949368Z  * [new branch]                gh/guangyey/234/head    -> origin/gh/guangyey/234/head
2025-12-04T10:32:19.3949438Z  * [new branch]                gh/guangyey/234/orig    -> origin/gh/guangyey/234/orig
2025-12-04T10:32:19.3949539Z  * [new branch]                gh/guangyey/235/base    -> origin/gh/guangyey/235/base
2025-12-04T10:32:19.3949640Z  * [new branch]                gh/guangyey/235/head    -> origin/gh/guangyey/235/head
2025-12-04T10:32:19.3949714Z  * [new branch]                gh/guangyey/235/orig    -> origin/gh/guangyey/235/orig
2025-12-04T10:32:19.3949784Z  * [new branch]                gh/guangyey/236/base    -> origin/gh/guangyey/236/base
2025-12-04T10:32:19.3949859Z  * [new branch]                gh/guangyey/236/head    -> origin/gh/guangyey/236/head
2025-12-04T10:32:19.3949930Z  * [new branch]                gh/guangyey/236/orig    -> origin/gh/guangyey/236/orig
2025-12-04T10:32:19.3950003Z  * [new branch]                gh/guangyey/237/base    -> origin/gh/guangyey/237/base
2025-12-04T10:32:19.3950074Z  * [new branch]                gh/guangyey/237/head    -> origin/gh/guangyey/237/head
2025-12-04T10:32:19.3950145Z  * [new branch]                gh/guangyey/237/orig    -> origin/gh/guangyey/237/orig
2025-12-04T10:32:19.3950222Z  * [new branch]                gh/guangyey/238/base    -> origin/gh/guangyey/238/base
2025-12-04T10:32:19.3950293Z  * [new branch]                gh/guangyey/238/head    -> origin/gh/guangyey/238/head
2025-12-04T10:32:19.3950364Z  * [new branch]                gh/guangyey/239/base    -> origin/gh/guangyey/239/base
2025-12-04T10:32:19.3950501Z  * [new branch]                gh/guangyey/239/head    -> origin/gh/guangyey/239/head
2025-12-04T10:32:19.3950571Z  * [new branch]                gh/guangyey/239/orig    -> origin/gh/guangyey/239/orig
2025-12-04T10:32:19.3950642Z  * [new branch]                gh/guangyey/240/base    -> origin/gh/guangyey/240/base
2025-12-04T10:32:19.3950717Z  * [new branch]                gh/guangyey/240/head    -> origin/gh/guangyey/240/head
2025-12-04T10:32:19.3950789Z  * [new branch]                gh/guangyey/240/orig    -> origin/gh/guangyey/240/orig
2025-12-04T10:32:19.3950860Z  * [new branch]                gh/guangyey/241/base    -> origin/gh/guangyey/241/base
2025-12-04T10:32:19.3950934Z  * [new branch]                gh/guangyey/241/head    -> origin/gh/guangyey/241/head
2025-12-04T10:32:19.3951004Z  * [new branch]                gh/guangyey/241/orig    -> origin/gh/guangyey/241/orig
2025-12-04T10:32:19.3951074Z  * [new branch]                gh/guangyey/242/base    -> origin/gh/guangyey/242/base
2025-12-04T10:32:19.3951148Z  * [new branch]                gh/guangyey/242/head    -> origin/gh/guangyey/242/head
2025-12-04T10:32:19.3951219Z  * [new branch]                gh/guangyey/242/orig    -> origin/gh/guangyey/242/orig
2025-12-04T10:32:19.3951288Z  * [new branch]                gh/guangyey/243/base    -> origin/gh/guangyey/243/base
2025-12-04T10:32:19.3951363Z  * [new branch]                gh/guangyey/243/head    -> origin/gh/guangyey/243/head
2025-12-04T10:32:19.3951435Z  * [new branch]                gh/guangyey/243/orig    -> origin/gh/guangyey/243/orig
2025-12-04T10:32:19.3951509Z  * [new branch]                gh/guangyey/244/base    -> origin/gh/guangyey/244/base
2025-12-04T10:32:19.3951583Z  * [new branch]                gh/guangyey/244/head    -> origin/gh/guangyey/244/head
2025-12-04T10:32:19.3951655Z  * [new branch]                gh/guangyey/244/orig    -> origin/gh/guangyey/244/orig
2025-12-04T10:32:19.3951726Z  * [new branch]                gh/guangyey/245/base    -> origin/gh/guangyey/245/base
2025-12-04T10:32:19.3951797Z  * [new branch]                gh/guangyey/245/head    -> origin/gh/guangyey/245/head
2025-12-04T10:32:19.3951867Z  * [new branch]                gh/guangyey/245/orig    -> origin/gh/guangyey/245/orig
2025-12-04T10:32:19.3951938Z  * [new branch]                gh/guangyey/246/base    -> origin/gh/guangyey/246/base
2025-12-04T10:32:19.3952008Z  * [new branch]                gh/guangyey/246/head    -> origin/gh/guangyey/246/head
2025-12-04T10:32:19.3952078Z  * [new branch]                gh/guangyey/246/orig    -> origin/gh/guangyey/246/orig
2025-12-04T10:32:19.3952149Z  * [new branch]                gh/guangyey/247/base    -> origin/gh/guangyey/247/base
2025-12-04T10:32:19.3952272Z  * [new branch]                gh/guangyey/247/head    -> origin/gh/guangyey/247/head
2025-12-04T10:32:19.3952342Z  * [new branch]                gh/guangyey/247/orig    -> origin/gh/guangyey/247/orig
2025-12-04T10:32:19.3952413Z  * [new branch]                gh/guangyey/248/base    -> origin/gh/guangyey/248/base
2025-12-04T10:32:19.3952484Z  * [new branch]                gh/guangyey/248/head    -> origin/gh/guangyey/248/head
2025-12-04T10:32:19.3952555Z  * [new branch]                gh/guangyey/248/orig    -> origin/gh/guangyey/248/orig
2025-12-04T10:32:19.3952628Z  * [new branch]                gh/guangyey/249/base    -> origin/gh/guangyey/249/base
2025-12-04T10:32:19.3952698Z  * [new branch]                gh/guangyey/249/head    -> origin/gh/guangyey/249/head
2025-12-04T10:32:19.3952766Z  * [new branch]                gh/guangyey/249/orig    -> origin/gh/guangyey/249/orig
2025-12-04T10:32:19.3952841Z  * [new branch]                gh/guangyey/250/base    -> origin/gh/guangyey/250/base
2025-12-04T10:32:19.3952914Z  * [new branch]                gh/guangyey/250/head    -> origin/gh/guangyey/250/head
2025-12-04T10:32:19.3952987Z  * [new branch]                gh/guangyey/250/orig    -> origin/gh/guangyey/250/orig
2025-12-04T10:32:19.3953059Z  * [new branch]                gh/guangyey/251/base    -> origin/gh/guangyey/251/base
2025-12-04T10:32:19.3953158Z  * [new branch]                gh/guangyey/251/head    -> origin/gh/guangyey/251/head
2025-12-04T10:32:19.3953230Z  * [new branch]                gh/guangyey/251/orig    -> origin/gh/guangyey/251/orig
2025-12-04T10:32:19.3953301Z  * [new branch]                gh/guangyey/252/base    -> origin/gh/guangyey/252/base
2025-12-04T10:32:19.3953371Z  * [new branch]                gh/guangyey/252/head    -> origin/gh/guangyey/252/head
2025-12-04T10:32:19.3953442Z  * [new branch]                gh/guangyey/252/orig    -> origin/gh/guangyey/252/orig
2025-12-04T10:32:19.3953513Z  * [new branch]                gh/guangyey/253/base    -> origin/gh/guangyey/253/base
2025-12-04T10:32:19.3953585Z  * [new branch]                gh/guangyey/253/head    -> origin/gh/guangyey/253/head
2025-12-04T10:32:19.3953658Z  * [new branch]                gh/guangyey/253/orig    -> origin/gh/guangyey/253/orig
2025-12-04T10:32:19.3953729Z  * [new branch]                gh/guangyey/254/base    -> origin/gh/guangyey/254/base
2025-12-04T10:32:19.3953801Z  * [new branch]                gh/guangyey/254/head    -> origin/gh/guangyey/254/head
2025-12-04T10:32:19.3953876Z  * [new branch]                gh/guangyey/254/orig    -> origin/gh/guangyey/254/orig
2025-12-04T10:32:19.3953947Z  * [new branch]                gh/guangyey/255/base    -> origin/gh/guangyey/255/base
2025-12-04T10:32:19.3954018Z  * [new branch]                gh/guangyey/255/head    -> origin/gh/guangyey/255/head
2025-12-04T10:32:19.3954091Z  * [new branch]                gh/guangyey/255/orig    -> origin/gh/guangyey/255/orig
2025-12-04T10:32:19.3954190Z  * [new branch]                gh/guilhermeleobas/107/base -> origin/gh/guilhermeleobas/107/base
2025-12-04T10:32:19.3954283Z  * [new branch]                gh/guilhermeleobas/107/head -> origin/gh/guilhermeleobas/107/head
2025-12-04T10:32:19.3954375Z  * [new branch]                gh/guilhermeleobas/107/orig -> origin/gh/guilhermeleobas/107/orig
2025-12-04T10:32:19.3954464Z  * [new branch]                gh/guilhermeleobas/108/base -> origin/gh/guilhermeleobas/108/base
2025-12-04T10:32:19.3954556Z  * [new branch]                gh/guilhermeleobas/108/head -> origin/gh/guilhermeleobas/108/head
2025-12-04T10:32:19.3954645Z  * [new branch]                gh/guilhermeleobas/108/orig -> origin/gh/guilhermeleobas/108/orig
2025-12-04T10:32:19.3954733Z  * [new branch]                gh/guilhermeleobas/150/base -> origin/gh/guilhermeleobas/150/base
2025-12-04T10:32:19.3954822Z  * [new branch]                gh/guilhermeleobas/150/head -> origin/gh/guilhermeleobas/150/head
2025-12-04T10:32:19.3954910Z  * [new branch]                gh/guilhermeleobas/150/orig -> origin/gh/guilhermeleobas/150/orig
2025-12-04T10:32:19.3955029Z  * [new branch]                gh/guilhermeleobas/168/base -> origin/gh/guilhermeleobas/168/base
2025-12-04T10:32:19.3955120Z  * [new branch]                gh/guilhermeleobas/168/head -> origin/gh/guilhermeleobas/168/head
2025-12-04T10:32:19.3955208Z  * [new branch]                gh/guilhermeleobas/168/orig -> origin/gh/guilhermeleobas/168/orig
2025-12-04T10:32:19.3955296Z  * [new branch]                gh/guilhermeleobas/169/base -> origin/gh/guilhermeleobas/169/base
2025-12-04T10:32:19.3955388Z  * [new branch]                gh/guilhermeleobas/169/head -> origin/gh/guilhermeleobas/169/head
2025-12-04T10:32:19.3955476Z  * [new branch]                gh/guilhermeleobas/169/orig -> origin/gh/guilhermeleobas/169/orig
2025-12-04T10:32:19.3955564Z  * [new branch]                gh/guilhermeleobas/170/base -> origin/gh/guilhermeleobas/170/base
2025-12-04T10:32:19.3955653Z  * [new branch]                gh/guilhermeleobas/170/head -> origin/gh/guilhermeleobas/170/head
2025-12-04T10:32:19.3955743Z  * [new branch]                gh/guilhermeleobas/170/orig -> origin/gh/guilhermeleobas/170/orig
2025-12-04T10:32:19.3955830Z  * [new branch]                gh/guilhermeleobas/171/base -> origin/gh/guilhermeleobas/171/base
2025-12-04T10:32:19.3955917Z  * [new branch]                gh/guilhermeleobas/171/head -> origin/gh/guilhermeleobas/171/head
2025-12-04T10:32:19.3956045Z  * [new branch]                gh/guilhermeleobas/171/orig -> origin/gh/guilhermeleobas/171/orig
2025-12-04T10:32:19.3956135Z  * [new branch]                gh/guilhermeleobas/173/base -> origin/gh/guilhermeleobas/173/base
2025-12-04T10:32:19.3956227Z  * [new branch]                gh/guilhermeleobas/173/head -> origin/gh/guilhermeleobas/173/head
2025-12-04T10:32:19.3956314Z  * [new branch]                gh/guilhermeleobas/173/orig -> origin/gh/guilhermeleobas/173/orig
2025-12-04T10:32:19.3956404Z  * [new branch]                gh/guilhermeleobas/193/base -> origin/gh/guilhermeleobas/193/base
2025-12-04T10:32:19.3956495Z  * [new branch]                gh/guilhermeleobas/193/head -> origin/gh/guilhermeleobas/193/head
2025-12-04T10:32:19.3956583Z  * [new branch]                gh/guilhermeleobas/193/orig -> origin/gh/guilhermeleobas/193/orig
2025-12-04T10:32:19.3956672Z  * [new branch]                gh/guilhermeleobas/204/base -> origin/gh/guilhermeleobas/204/base
2025-12-04T10:32:19.3956761Z  * [new branch]                gh/guilhermeleobas/204/head -> origin/gh/guilhermeleobas/204/head
2025-12-04T10:32:19.3956851Z  * [new branch]                gh/guilhermeleobas/204/orig -> origin/gh/guilhermeleobas/204/orig
2025-12-04T10:32:19.3956941Z  * [new branch]                gh/guilhermeleobas/211/base -> origin/gh/guilhermeleobas/211/base
2025-12-04T10:32:19.3957030Z  * [new branch]                gh/guilhermeleobas/211/head -> origin/gh/guilhermeleobas/211/head
2025-12-04T10:32:19.3957118Z  * [new branch]                gh/guilhermeleobas/211/orig -> origin/gh/guilhermeleobas/211/orig
2025-12-04T10:32:19.3957209Z  * [new branch]                gh/guilhermeleobas/226/base -> origin/gh/guilhermeleobas/226/base
2025-12-04T10:32:19.3957297Z  * [new branch]                gh/guilhermeleobas/226/head -> origin/gh/guilhermeleobas/226/head
2025-12-04T10:32:19.3957384Z  * [new branch]                gh/guilhermeleobas/226/orig -> origin/gh/guilhermeleobas/226/orig
2025-12-04T10:32:19.3957473Z  * [new branch]                gh/guilhermeleobas/236/base -> origin/gh/guilhermeleobas/236/base
2025-12-04T10:32:19.3957559Z  * [new branch]                gh/guilhermeleobas/236/head -> origin/gh/guilhermeleobas/236/head
2025-12-04T10:32:19.3957651Z  * [new branch]                gh/guilhermeleobas/236/orig -> origin/gh/guilhermeleobas/236/orig
2025-12-04T10:32:19.3957741Z  * [new branch]                gh/guilhermeleobas/247/base -> origin/gh/guilhermeleobas/247/base
2025-12-04T10:32:19.3957830Z  * [new branch]                gh/guilhermeleobas/247/head -> origin/gh/guilhermeleobas/247/head
2025-12-04T10:32:19.3957952Z  * [new branch]                gh/guilhermeleobas/247/orig -> origin/gh/guilhermeleobas/247/orig
2025-12-04T10:32:19.3958040Z  * [new branch]                gh/guilhermeleobas/248/base -> origin/gh/guilhermeleobas/248/base
2025-12-04T10:32:19.3958128Z  * [new branch]                gh/guilhermeleobas/248/head -> origin/gh/guilhermeleobas/248/head
2025-12-04T10:32:19.3958219Z  * [new branch]                gh/guilhermeleobas/248/orig -> origin/gh/guilhermeleobas/248/orig
2025-12-04T10:32:19.3958309Z  * [new branch]                gh/guilhermeleobas/250/base -> origin/gh/guilhermeleobas/250/base
2025-12-04T10:32:19.3958396Z  * [new branch]                gh/guilhermeleobas/250/head -> origin/gh/guilhermeleobas/250/head
2025-12-04T10:32:19.3958486Z  * [new branch]                gh/guilhermeleobas/250/orig -> origin/gh/guilhermeleobas/250/orig
2025-12-04T10:32:19.3958574Z  * [new branch]                gh/guilhermeleobas/253/base -> origin/gh/guilhermeleobas/253/base
2025-12-04T10:32:19.3958662Z  * [new branch]                gh/guilhermeleobas/253/head -> origin/gh/guilhermeleobas/253/head
2025-12-04T10:32:19.3958751Z  * [new branch]                gh/guilhermeleobas/253/orig -> origin/gh/guilhermeleobas/253/orig
2025-12-04T10:32:19.3958838Z  * [new branch]                gh/guilhermeleobas/254/base -> origin/gh/guilhermeleobas/254/base
2025-12-04T10:32:19.3958951Z  * [new branch]                gh/guilhermeleobas/254/head -> origin/gh/guilhermeleobas/254/head
2025-12-04T10:32:19.3959041Z  * [new branch]                gh/guilhermeleobas/254/orig -> origin/gh/guilhermeleobas/254/orig
2025-12-04T10:32:19.3959129Z  * [new branch]                gh/guilhermeleobas/255/base -> origin/gh/guilhermeleobas/255/base
2025-12-04T10:32:19.3959219Z  * [new branch]                gh/guilhermeleobas/255/head -> origin/gh/guilhermeleobas/255/head
2025-12-04T10:32:19.3959306Z  * [new branch]                gh/guilhermeleobas/255/orig -> origin/gh/guilhermeleobas/255/orig
2025-12-04T10:32:19.3959395Z  * [new branch]                gh/guilhermeleobas/256/base -> origin/gh/guilhermeleobas/256/base
2025-12-04T10:32:19.3959486Z  * [new branch]                gh/guilhermeleobas/256/head -> origin/gh/guilhermeleobas/256/head
2025-12-04T10:32:19.3959608Z  * [new branch]                gh/guilhermeleobas/256/orig -> origin/gh/guilhermeleobas/256/orig
2025-12-04T10:32:19.3959699Z  * [new branch]                gh/guilhermeleobas/257/base -> origin/gh/guilhermeleobas/257/base
2025-12-04T10:32:19.3959789Z  * [new branch]                gh/guilhermeleobas/257/head -> origin/gh/guilhermeleobas/257/head
2025-12-04T10:32:19.3959878Z  * [new branch]                gh/guilhermeleobas/257/orig -> origin/gh/guilhermeleobas/257/orig
2025-12-04T10:32:19.3959965Z  * [new branch]                gh/guilhermeleobas/258/base -> origin/gh/guilhermeleobas/258/base
2025-12-04T10:32:19.3960054Z  * [new branch]                gh/guilhermeleobas/258/head -> origin/gh/guilhermeleobas/258/head
2025-12-04T10:32:19.3960141Z  * [new branch]                gh/guilhermeleobas/258/orig -> origin/gh/guilhermeleobas/258/orig
2025-12-04T10:32:19.3960228Z  * [new branch]                gh/guilhermeleobas/259/base -> origin/gh/guilhermeleobas/259/base
2025-12-04T10:32:19.3960319Z  * [new branch]                gh/guilhermeleobas/259/head -> origin/gh/guilhermeleobas/259/head
2025-12-04T10:32:19.3960407Z  * [new branch]                gh/guilhermeleobas/259/orig -> origin/gh/guilhermeleobas/259/orig
2025-12-04T10:32:19.3960494Z  * [new branch]                gh/guilhermeleobas/260/base -> origin/gh/guilhermeleobas/260/base
2025-12-04T10:32:19.3960581Z  * [new branch]                gh/guilhermeleobas/260/head -> origin/gh/guilhermeleobas/260/head
2025-12-04T10:32:19.3960667Z  * [new branch]                gh/guilhermeleobas/260/orig -> origin/gh/guilhermeleobas/260/orig
2025-12-04T10:32:19.3960754Z  * [new branch]                gh/guilhermeleobas/261/base -> origin/gh/guilhermeleobas/261/base
2025-12-04T10:32:19.3960841Z  * [new branch]                gh/guilhermeleobas/261/head -> origin/gh/guilhermeleobas/261/head
2025-12-04T10:32:19.3960985Z  * [new branch]                gh/guilhermeleobas/261/orig -> origin/gh/guilhermeleobas/261/orig
2025-12-04T10:32:19.3961073Z  * [new branch]                gh/guilhermeleobas/262/base -> origin/gh/guilhermeleobas/262/base
2025-12-04T10:32:19.3961160Z  * [new branch]                gh/guilhermeleobas/262/head -> origin/gh/guilhermeleobas/262/head
2025-12-04T10:32:19.3961249Z  * [new branch]                gh/guilhermeleobas/262/orig -> origin/gh/guilhermeleobas/262/orig
2025-12-04T10:32:19.3961335Z  * [new branch]                gh/guilhermeleobas/263/base -> origin/gh/guilhermeleobas/263/base
2025-12-04T10:32:19.3961422Z  * [new branch]                gh/guilhermeleobas/263/head -> origin/gh/guilhermeleobas/263/head
2025-12-04T10:32:19.3961510Z  * [new branch]                gh/guilhermeleobas/263/orig -> origin/gh/guilhermeleobas/263/orig
2025-12-04T10:32:19.3961597Z  * [new branch]                gh/guilhermeleobas/264/base -> origin/gh/guilhermeleobas/264/base
2025-12-04T10:32:19.3961687Z  * [new branch]                gh/guilhermeleobas/264/head -> origin/gh/guilhermeleobas/264/head
2025-12-04T10:32:19.3961775Z  * [new branch]                gh/guilhermeleobas/264/orig -> origin/gh/guilhermeleobas/264/orig
2025-12-04T10:32:19.3961862Z  * [new branch]                gh/guilhermeleobas/265/base -> origin/gh/guilhermeleobas/265/base
2025-12-04T10:32:19.3961986Z  * [new branch]                gh/guilhermeleobas/265/head -> origin/gh/guilhermeleobas/265/head
2025-12-04T10:32:19.3962073Z  * [new branch]                gh/guilhermeleobas/265/orig -> origin/gh/guilhermeleobas/265/orig
2025-12-04T10:32:19.3962159Z  * [new branch]                gh/guilhermeleobas/266/base -> origin/gh/guilhermeleobas/266/base
2025-12-04T10:32:19.3962246Z  * [new branch]                gh/guilhermeleobas/266/head -> origin/gh/guilhermeleobas/266/head
2025-12-04T10:32:19.3962334Z  * [new branch]                gh/guilhermeleobas/266/orig -> origin/gh/guilhermeleobas/266/orig
2025-12-04T10:32:19.3962422Z  * [new branch]                gh/guilhermeleobas/267/base -> origin/gh/guilhermeleobas/267/base
2025-12-04T10:32:19.3962508Z  * [new branch]                gh/guilhermeleobas/267/head -> origin/gh/guilhermeleobas/267/head
2025-12-04T10:32:19.3962596Z  * [new branch]                gh/guilhermeleobas/267/orig -> origin/gh/guilhermeleobas/267/orig
2025-12-04T10:32:19.3962679Z  * [new branch]                gh/hameerabbasi/1/base  -> origin/gh/hameerabbasi/1/base
2025-12-04T10:32:19.3962758Z  * [new branch]                gh/hameerabbasi/1/head  -> origin/gh/hameerabbasi/1/head
2025-12-04T10:32:19.3962833Z  * [new branch]                gh/hameerabbasi/2/base  -> origin/gh/hameerabbasi/2/base
2025-12-04T10:32:19.3962908Z  * [new branch]                gh/hameerabbasi/2/head  -> origin/gh/hameerabbasi/2/head
2025-12-04T10:32:19.3962983Z  * [new branch]                gh/hameerabbasi/2/orig  -> origin/gh/hameerabbasi/2/orig
2025-12-04T10:32:19.3963058Z  * [new branch]                gh/hameerabbasi/3/base  -> origin/gh/hameerabbasi/3/base
2025-12-04T10:32:19.3963132Z  * [new branch]                gh/hameerabbasi/3/head  -> origin/gh/hameerabbasi/3/head
2025-12-04T10:32:19.3963206Z  * [new branch]                gh/hameerabbasi/3/orig  -> origin/gh/hameerabbasi/3/orig
2025-12-04T10:32:19.3963279Z  * [new branch]                gh/hameerabbasi/4/base  -> origin/gh/hameerabbasi/4/base
2025-12-04T10:32:19.3963354Z  * [new branch]                gh/hameerabbasi/4/head  -> origin/gh/hameerabbasi/4/head
2025-12-04T10:32:19.3963428Z  * [new branch]                gh/hameerabbasi/4/orig  -> origin/gh/hameerabbasi/4/orig
2025-12-04T10:32:19.3963498Z  * [new branch]                gh/huydhn/1/next        -> origin/gh/huydhn/1/next
2025-12-04T10:32:19.3963567Z  * [new branch]                gh/huydhn/2/next        -> origin/gh/huydhn/2/next
2025-12-04T10:32:19.3963635Z  * [new branch]                gh/huydhn/3/next        -> origin/gh/huydhn/3/next
2025-12-04T10:32:19.3963731Z  * [new branch]                gh/huydhn/4/next        -> origin/gh/huydhn/4/next
2025-12-04T10:32:19.3963796Z  * [new branch]                gh/huydhn/5/next        -> origin/gh/huydhn/5/next
2025-12-04T10:32:19.3963860Z  * [new branch]                gh/huydhn/6/next        -> origin/gh/huydhn/6/next
2025-12-04T10:32:19.3963926Z  * [new branch]                gh/int3/97/base         -> origin/gh/int3/97/base
2025-12-04T10:32:19.3963992Z  * [new branch]                gh/int3/97/head         -> origin/gh/int3/97/head
2025-12-04T10:32:19.3964066Z  * [new branch]                gh/isuruf/101/base      -> origin/gh/isuruf/101/base
2025-12-04T10:32:19.3964134Z  * [new branch]                gh/isuruf/101/head      -> origin/gh/isuruf/101/head
2025-12-04T10:32:19.3964201Z  * [new branch]                gh/isuruf/146/base      -> origin/gh/isuruf/146/base
2025-12-04T10:32:19.3964269Z  * [new branch]                gh/isuruf/146/head      -> origin/gh/isuruf/146/head
2025-12-04T10:32:19.3964336Z  * [new branch]                gh/isuruf/146/orig      -> origin/gh/isuruf/146/orig
2025-12-04T10:32:19.3964403Z  * [new branch]                gh/isuruf/158/base      -> origin/gh/isuruf/158/base
2025-12-04T10:32:19.3964469Z  * [new branch]                gh/isuruf/158/head      -> origin/gh/isuruf/158/head
2025-12-04T10:32:19.3964535Z  * [new branch]                gh/isuruf/159/base      -> origin/gh/isuruf/159/base
2025-12-04T10:32:19.3964634Z  * [new branch]                gh/isuruf/159/head      -> origin/gh/isuruf/159/head
2025-12-04T10:32:19.3964700Z  * [new branch]                gh/isuruf/160/base      -> origin/gh/isuruf/160/base
2025-12-04T10:32:19.3964765Z  * [new branch]                gh/isuruf/160/head      -> origin/gh/isuruf/160/head
2025-12-04T10:32:19.3964832Z  * [new branch]                gh/isuruf/160/orig      -> origin/gh/isuruf/160/orig
2025-12-04T10:32:19.3964899Z  * [new branch]                gh/isuruf/81/base       -> origin/gh/isuruf/81/base
2025-12-04T10:32:19.3964966Z  * [new branch]                gh/isuruf/81/head       -> origin/gh/isuruf/81/head
2025-12-04T10:32:19.3965033Z  * [new branch]                gh/isuruf/81/orig       -> origin/gh/isuruf/81/orig
2025-12-04T10:32:19.3965105Z  * [new branch]                gh/jamesjwu/176/base    -> origin/gh/jamesjwu/176/base
2025-12-04T10:32:19.3965177Z  * [new branch]                gh/jamesjwu/176/head    -> origin/gh/jamesjwu/176/head
2025-12-04T10:32:19.3965251Z  * [new branch]                gh/jamesjwu/176/orig    -> origin/gh/jamesjwu/176/orig
2025-12-04T10:32:19.3965323Z  * [new branch]                gh/jamesjwu/187/base    -> origin/gh/jamesjwu/187/base
2025-12-04T10:32:19.3965394Z  * [new branch]                gh/jamesjwu/187/head    -> origin/gh/jamesjwu/187/head
2025-12-04T10:32:19.3965464Z  * [new branch]                gh/jamesjwu/187/orig    -> origin/gh/jamesjwu/187/orig
2025-12-04T10:32:19.3965533Z  * [new branch]                gh/jamesjwu/196/base    -> origin/gh/jamesjwu/196/base
2025-12-04T10:32:19.3965604Z  * [new branch]                gh/jamesjwu/196/head    -> origin/gh/jamesjwu/196/head
2025-12-04T10:32:19.3965675Z  * [new branch]                gh/jamesjwu/196/orig    -> origin/gh/jamesjwu/196/orig
2025-12-04T10:32:19.3965745Z  * [new branch]                gh/jamesjwu/198/base    -> origin/gh/jamesjwu/198/base
2025-12-04T10:32:19.3965815Z  * [new branch]                gh/jamesjwu/198/head    -> origin/gh/jamesjwu/198/head
2025-12-04T10:32:19.3965886Z  * [new branch]                gh/jamesjwu/198/orig    -> origin/gh/jamesjwu/198/orig
2025-12-04T10:32:19.3965955Z  * [new branch]                gh/jamesjwu/207/base    -> origin/gh/jamesjwu/207/base
2025-12-04T10:32:19.3966025Z  * [new branch]                gh/jamesjwu/207/head    -> origin/gh/jamesjwu/207/head
2025-12-04T10:32:19.3966094Z  * [new branch]                gh/jamesjwu/207/orig    -> origin/gh/jamesjwu/207/orig
2025-12-04T10:32:19.3966163Z  * [new branch]                gh/jamesjwu/208/base    -> origin/gh/jamesjwu/208/base
2025-12-04T10:32:19.3966278Z  * [new branch]                gh/jamesjwu/208/head    -> origin/gh/jamesjwu/208/head
2025-12-04T10:32:19.3966347Z  * [new branch]                gh/jamesjwu/208/orig    -> origin/gh/jamesjwu/208/orig
2025-12-04T10:32:19.3966418Z  * [new branch]                gh/jamesjwu/52/base     -> origin/gh/jamesjwu/52/base
2025-12-04T10:32:19.3966489Z  * [new branch]                gh/jamesjwu/52/head     -> origin/gh/jamesjwu/52/head
2025-12-04T10:32:19.3966559Z  * [new branch]                gh/jamesjwu/53/base     -> origin/gh/jamesjwu/53/base
2025-12-04T10:32:19.3966628Z  * [new branch]                gh/jamesjwu/53/head     -> origin/gh/jamesjwu/53/head
2025-12-04T10:32:19.3966701Z  * [new branch]                gh/jamesjwu/54/base     -> origin/gh/jamesjwu/54/base
2025-12-04T10:32:19.3966770Z  * [new branch]                gh/jamesjwu/54/head     -> origin/gh/jamesjwu/54/head
2025-12-04T10:32:19.3966839Z  * [new branch]                gh/jamesjwu/55/base     -> origin/gh/jamesjwu/55/base
2025-12-04T10:32:19.3966913Z  * [new branch]                gh/jamesjwu/55/head     -> origin/gh/jamesjwu/55/head
2025-12-04T10:32:19.3966980Z  * [new branch]                gh/jamesjwu/56/base     -> origin/gh/jamesjwu/56/base
2025-12-04T10:32:19.3967048Z  * [new branch]                gh/jamesjwu/56/head     -> origin/gh/jamesjwu/56/head
2025-12-04T10:32:19.3967117Z  * [new branch]                gh/jamesjwu/57/base     -> origin/gh/jamesjwu/57/base
2025-12-04T10:32:19.3967215Z  * [new branch]                gh/jamesjwu/57/head     -> origin/gh/jamesjwu/57/head
2025-12-04T10:32:19.3967283Z  * [new branch]                gh/jamesjwu/58/base     -> origin/gh/jamesjwu/58/base
2025-12-04T10:32:19.3967353Z  * [new branch]                gh/jamesjwu/58/head     -> origin/gh/jamesjwu/58/head
2025-12-04T10:32:19.3967421Z  * [new branch]                gh/jamesjwu/59/base     -> origin/gh/jamesjwu/59/base
2025-12-04T10:32:19.3967494Z  * [new branch]                gh/jamesjwu/59/head     -> origin/gh/jamesjwu/59/head
2025-12-04T10:32:19.3967565Z  * [new branch]                gh/jamesjwu/60/base     -> origin/gh/jamesjwu/60/base
2025-12-04T10:32:19.3967634Z  * [new branch]                gh/jamesjwu/60/head     -> origin/gh/jamesjwu/60/head
2025-12-04T10:32:19.3967707Z  * [new branch]                gh/jamesjwu/61/base     -> origin/gh/jamesjwu/61/base
2025-12-04T10:32:19.3967777Z  * [new branch]                gh/jamesjwu/61/head     -> origin/gh/jamesjwu/61/head
2025-12-04T10:32:19.3967849Z  * [new branch]                gh/jamesjwu/62/base     -> origin/gh/jamesjwu/62/base
2025-12-04T10:32:19.3967921Z  * [new branch]                gh/jamesjwu/62/head     -> origin/gh/jamesjwu/62/head
2025-12-04T10:32:19.3967989Z  * [new branch]                gh/jamesjwu/63/base     -> origin/gh/jamesjwu/63/base
2025-12-04T10:32:19.3968057Z  * [new branch]                gh/jamesjwu/63/head     -> origin/gh/jamesjwu/63/head
2025-12-04T10:32:19.3968129Z  * [new branch]                gh/jamesjwu/64/base     -> origin/gh/jamesjwu/64/base
2025-12-04T10:32:19.3968201Z  * [new branch]                gh/jamesjwu/64/head     -> origin/gh/jamesjwu/64/head
2025-12-04T10:32:19.3968271Z  * [new branch]                gh/jamesjwu/65/base     -> origin/gh/jamesjwu/65/base
2025-12-04T10:32:19.3968342Z  * [new branch]                gh/jamesjwu/65/head     -> origin/gh/jamesjwu/65/head
2025-12-04T10:32:19.3968413Z  * [new branch]                gh/janeyx99/165/base    -> origin/gh/janeyx99/165/base
2025-12-04T10:32:19.3968483Z  * [new branch]                gh/janeyx99/165/head    -> origin/gh/janeyx99/165/head
2025-12-04T10:32:19.3968556Z  * [new branch]                gh/janeyx99/165/orig    -> origin/gh/janeyx99/165/orig
2025-12-04T10:32:19.3968625Z  * [new branch]                gh/janeyx99/201/base    -> origin/gh/janeyx99/201/base
2025-12-04T10:32:19.3968694Z  * [new branch]                gh/janeyx99/201/head    -> origin/gh/janeyx99/201/head
2025-12-04T10:32:19.3968765Z  * [new branch]                gh/janeyx99/201/orig    -> origin/gh/janeyx99/201/orig
2025-12-04T10:32:19.3968867Z  * [new branch]                gh/janeyx99/225/base    -> origin/gh/janeyx99/225/base
2025-12-04T10:32:19.3968938Z  * [new branch]                gh/janeyx99/225/head    -> origin/gh/janeyx99/225/head
2025-12-04T10:32:19.3969008Z  * [new branch]                gh/janeyx99/225/orig    -> origin/gh/janeyx99/225/orig
2025-12-04T10:32:19.3969077Z  * [new branch]                gh/janeyx99/299/base    -> origin/gh/janeyx99/299/base
2025-12-04T10:32:19.3969150Z  * [new branch]                gh/janeyx99/299/head    -> origin/gh/janeyx99/299/head
2025-12-04T10:32:19.3969219Z  * [new branch]                gh/janeyx99/299/orig    -> origin/gh/janeyx99/299/orig
2025-12-04T10:32:19.3969287Z  * [new branch]                gh/janeyx99/302/base    -> origin/gh/janeyx99/302/base
2025-12-04T10:32:19.3969359Z  * [new branch]                gh/janeyx99/302/head    -> origin/gh/janeyx99/302/head
2025-12-04T10:32:19.3969429Z  * [new branch]                gh/janeyx99/303/base    -> origin/gh/janeyx99/303/base
2025-12-04T10:32:19.3969502Z  * [new branch]                gh/janeyx99/303/head    -> origin/gh/janeyx99/303/head
2025-12-04T10:32:19.3969631Z  * [new branch]                gh/janeyx99/305/base    -> origin/gh/janeyx99/305/base
2025-12-04T10:32:19.3969703Z  * [new branch]                gh/janeyx99/305/head    -> origin/gh/janeyx99/305/head
2025-12-04T10:32:19.3969821Z  * [new branch]                gh/janeyx99/306/base    -> origin/gh/janeyx99/306/base
2025-12-04T10:32:19.3969894Z  * [new branch]                gh/janeyx99/306/head    -> origin/gh/janeyx99/306/head
2025-12-04T10:32:19.3969964Z  * [new branch]                gh/janeyx99/314/base    -> origin/gh/janeyx99/314/base
2025-12-04T10:32:19.3970034Z  * [new branch]                gh/janeyx99/314/head    -> origin/gh/janeyx99/314/head
2025-12-04T10:32:19.3970105Z  * [new branch]                gh/janeyx99/314/orig    -> origin/gh/janeyx99/314/orig
2025-12-04T10:32:19.3970173Z  * [new branch]                gh/janeyx99/315/base    -> origin/gh/janeyx99/315/base
2025-12-04T10:32:19.3970245Z  * [new branch]                gh/janeyx99/315/head    -> origin/gh/janeyx99/315/head
2025-12-04T10:32:19.3970316Z  * [new branch]                gh/janeyx99/315/orig    -> origin/gh/janeyx99/315/orig
2025-12-04T10:32:19.3970386Z  * [new branch]                gh/janeyx99/316/base    -> origin/gh/janeyx99/316/base
2025-12-04T10:32:19.3970457Z  * [new branch]                gh/janeyx99/316/head    -> origin/gh/janeyx99/316/head
2025-12-04T10:32:19.3970529Z  * [new branch]                gh/janeyx99/316/orig    -> origin/gh/janeyx99/316/orig
2025-12-04T10:32:19.3970597Z  * [new branch]                gh/janeyx99/317/base    -> origin/gh/janeyx99/317/base
2025-12-04T10:32:19.3970665Z  * [new branch]                gh/janeyx99/317/head    -> origin/gh/janeyx99/317/head
2025-12-04T10:32:19.3970736Z  * [new branch]                gh/janeyx99/317/orig    -> origin/gh/janeyx99/317/orig
2025-12-04T10:32:19.3970805Z  * [new branch]                gh/janeyx99/325/base    -> origin/gh/janeyx99/325/base
2025-12-04T10:32:19.3970881Z  * [new branch]                gh/janeyx99/325/head    -> origin/gh/janeyx99/325/head
2025-12-04T10:32:19.3970950Z  * [new branch]                gh/janeyx99/325/orig    -> origin/gh/janeyx99/325/orig
2025-12-04T10:32:19.3971021Z  * [new branch]                gh/janeyx99/327/base    -> origin/gh/janeyx99/327/base
2025-12-04T10:32:19.3971093Z  * [new branch]                gh/janeyx99/327/head    -> origin/gh/janeyx99/327/head
2025-12-04T10:32:19.3971163Z  * [new branch]                gh/janeyx99/327/orig    -> origin/gh/janeyx99/327/orig
2025-12-04T10:32:19.3971233Z  * [new branch]                gh/janeyx99/328/base    -> origin/gh/janeyx99/328/base
2025-12-04T10:32:19.3971305Z  * [new branch]                gh/janeyx99/328/head    -> origin/gh/janeyx99/328/head
2025-12-04T10:32:19.3971374Z  * [new branch]                gh/janeyx99/328/orig    -> origin/gh/janeyx99/328/orig
2025-12-04T10:32:19.3971443Z  * [new branch]                gh/janeyx99/329/base    -> origin/gh/janeyx99/329/base
2025-12-04T10:32:19.3971560Z  * [new branch]                gh/janeyx99/329/head    -> origin/gh/janeyx99/329/head
2025-12-04T10:32:19.3971630Z  * [new branch]                gh/janeyx99/329/orig    -> origin/gh/janeyx99/329/orig
2025-12-04T10:32:19.3971700Z  * [new branch]                gh/janeyx99/330/base    -> origin/gh/janeyx99/330/base
2025-12-04T10:32:19.3971773Z  * [new branch]                gh/janeyx99/330/head    -> origin/gh/janeyx99/330/head
2025-12-04T10:32:19.3971843Z  * [new branch]                gh/janeyx99/330/orig    -> origin/gh/janeyx99/330/orig
2025-12-04T10:32:19.3971913Z  * [new branch]                gh/janeyx99/331/base    -> origin/gh/janeyx99/331/base
2025-12-04T10:32:19.3971985Z  * [new branch]                gh/janeyx99/331/head    -> origin/gh/janeyx99/331/head
2025-12-04T10:32:19.3972055Z  * [new branch]                gh/janeyx99/331/orig    -> origin/gh/janeyx99/331/orig
2025-12-04T10:32:19.3972125Z  * [new branch]                gh/janeyx99/332/base    -> origin/gh/janeyx99/332/base
2025-12-04T10:32:19.3972198Z  * [new branch]                gh/janeyx99/332/head    -> origin/gh/janeyx99/332/head
2025-12-04T10:32:19.3972268Z  * [new branch]                gh/janeyx99/332/orig    -> origin/gh/janeyx99/332/orig
2025-12-04T10:32:19.3972339Z  * [new branch]                gh/janeyx99/333/base    -> origin/gh/janeyx99/333/base
2025-12-04T10:32:19.3972441Z  * [new branch]                gh/janeyx99/333/head    -> origin/gh/janeyx99/333/head
2025-12-04T10:32:19.3972512Z  * [new branch]                gh/janeyx99/333/orig    -> origin/gh/janeyx99/333/orig
2025-12-04T10:32:19.3972581Z  * [new branch]                gh/janeyx99/88/base     -> origin/gh/janeyx99/88/base
2025-12-04T10:32:19.3972649Z  * [new branch]                gh/janeyx99/88/head     -> origin/gh/janeyx99/88/head
2025-12-04T10:32:19.3972717Z  * [new branch]                gh/janeyx99/88/orig     -> origin/gh/janeyx99/88/orig
2025-12-04T10:32:19.3972786Z  * [new branch]                gh/jansel/360/base      -> origin/gh/jansel/360/base
2025-12-04T10:32:19.3972856Z  * [new branch]                gh/jansel/360/head      -> origin/gh/jansel/360/head
2025-12-04T10:32:19.3972924Z  * [new branch]                gh/jansel/451/base      -> origin/gh/jansel/451/base
2025-12-04T10:32:19.3972993Z  * [new branch]                gh/jansel/451/head      -> origin/gh/jansel/451/head
2025-12-04T10:32:19.3973062Z  * [new branch]                gh/jansel/451/orig      -> origin/gh/jansel/451/orig
2025-12-04T10:32:19.3973127Z  * [new branch]                gh/jansel/462/base      -> origin/gh/jansel/462/base
2025-12-04T10:32:19.3973195Z  * [new branch]                gh/jansel/462/head      -> origin/gh/jansel/462/head
2025-12-04T10:32:19.3973262Z  * [new branch]                gh/jansel/462/orig      -> origin/gh/jansel/462/orig
2025-12-04T10:32:19.3973328Z  * [new branch]                gh/jansel/533/base      -> origin/gh/jansel/533/base
2025-12-04T10:32:19.3973395Z  * [new branch]                gh/jansel/533/head      -> origin/gh/jansel/533/head
2025-12-04T10:32:19.3973464Z  * [new branch]                gh/jansel/533/orig      -> origin/gh/jansel/533/orig
2025-12-04T10:32:19.3973531Z  * [new branch]                gh/jansel/552/base      -> origin/gh/jansel/552/base
2025-12-04T10:32:19.3973598Z  * [new branch]                gh/jansel/552/head      -> origin/gh/jansel/552/head
2025-12-04T10:32:19.3973665Z  * [new branch]                gh/jansel/552/orig      -> origin/gh/jansel/552/orig
2025-12-04T10:32:19.3973732Z  * [new branch]                gh/jansel/553/base      -> origin/gh/jansel/553/base
2025-12-04T10:32:19.3973801Z  * [new branch]                gh/jansel/553/head      -> origin/gh/jansel/553/head
2025-12-04T10:32:19.3973868Z  * [new branch]                gh/jansel/553/orig      -> origin/gh/jansel/553/orig
2025-12-04T10:32:19.3973937Z  * [new branch]                gh/jansel/554/base      -> origin/gh/jansel/554/base
2025-12-04T10:32:19.3974004Z  * [new branch]                gh/jansel/554/head      -> origin/gh/jansel/554/head
2025-12-04T10:32:19.3974102Z  * [new branch]                gh/jansel/554/orig      -> origin/gh/jansel/554/orig
2025-12-04T10:32:19.3974170Z  * [new branch]                gh/jansel/555/base      -> origin/gh/jansel/555/base
2025-12-04T10:32:19.3974238Z  * [new branch]                gh/jansel/555/head      -> origin/gh/jansel/555/head
2025-12-04T10:32:19.3974306Z  * [new branch]                gh/jansel/555/orig      -> origin/gh/jansel/555/orig
2025-12-04T10:32:19.3974373Z  * [new branch]                gh/jansel/556/base      -> origin/gh/jansel/556/base
2025-12-04T10:32:19.3974440Z  * [new branch]                gh/jansel/556/head      -> origin/gh/jansel/556/head
2025-12-04T10:32:19.3974506Z  * [new branch]                gh/jansel/556/orig      -> origin/gh/jansel/556/orig
2025-12-04T10:32:19.3974577Z  * [new branch]                gh/jansel/557/base      -> origin/gh/jansel/557/base
2025-12-04T10:32:19.3974645Z  * [new branch]                gh/jansel/557/head      -> origin/gh/jansel/557/head
2025-12-04T10:32:19.3974713Z  * [new branch]                gh/jansel/557/orig      -> origin/gh/jansel/557/orig
2025-12-04T10:32:19.3974783Z  * [new branch]                gh/jansel/558/base      -> origin/gh/jansel/558/base
2025-12-04T10:32:19.3974850Z  * [new branch]                gh/jansel/558/head      -> origin/gh/jansel/558/head
2025-12-04T10:32:19.3974942Z  * [new branch]                gh/jansel/558/orig      -> origin/gh/jansel/558/orig
2025-12-04T10:32:19.3975013Z  * [new branch]                gh/jansel/559/base      -> origin/gh/jansel/559/base
2025-12-04T10:32:19.3975080Z  * [new branch]                gh/jansel/559/head      -> origin/gh/jansel/559/head
2025-12-04T10:32:19.3975145Z  * [new branch]                gh/jansel/559/orig      -> origin/gh/jansel/559/orig
2025-12-04T10:32:19.3975213Z  * [new branch]                gh/jansel/560/base      -> origin/gh/jansel/560/base
2025-12-04T10:32:19.3975280Z  * [new branch]                gh/jansel/560/head      -> origin/gh/jansel/560/head
2025-12-04T10:32:19.3975348Z  * [new branch]                gh/jansel/560/orig      -> origin/gh/jansel/560/orig
2025-12-04T10:32:19.3975418Z  * [new branch]                gh/jansel/561/base      -> origin/gh/jansel/561/base
2025-12-04T10:32:19.3975485Z  * [new branch]                gh/jansel/561/head      -> origin/gh/jansel/561/head
2025-12-04T10:32:19.3975555Z  * [new branch]                gh/jansel/561/orig      -> origin/gh/jansel/561/orig
2025-12-04T10:32:19.3975621Z  * [new branch]                gh/jansel/562/base      -> origin/gh/jansel/562/base
2025-12-04T10:32:19.3975687Z  * [new branch]                gh/jansel/562/head      -> origin/gh/jansel/562/head
2025-12-04T10:32:19.3975756Z  * [new branch]                gh/jansel/562/orig      -> origin/gh/jansel/562/orig
2025-12-04T10:32:19.3975822Z  * [new branch]                gh/jansel/563/base      -> origin/gh/jansel/563/base
2025-12-04T10:32:19.3975889Z  * [new branch]                gh/jansel/563/head      -> origin/gh/jansel/563/head
2025-12-04T10:32:19.3975959Z  * [new branch]                gh/jansel/563/orig      -> origin/gh/jansel/563/orig
2025-12-04T10:32:19.3976026Z  * [new branch]                gh/jansel/564/base      -> origin/gh/jansel/564/base
2025-12-04T10:32:19.3976093Z  * [new branch]                gh/jansel/564/head      -> origin/gh/jansel/564/head
2025-12-04T10:32:19.3976163Z  * [new branch]                gh/jansel/564/orig      -> origin/gh/jansel/564/orig
2025-12-04T10:32:19.3976228Z  * [new branch]                gh/jansel/565/base      -> origin/gh/jansel/565/base
2025-12-04T10:32:19.3976295Z  * [new branch]                gh/jansel/565/head      -> origin/gh/jansel/565/head
2025-12-04T10:32:19.3976364Z  * [new branch]                gh/jansel/565/orig      -> origin/gh/jansel/565/orig
2025-12-04T10:32:19.3976429Z  * [new branch]                gh/jansel/566/base      -> origin/gh/jansel/566/base
2025-12-04T10:32:19.3976496Z  * [new branch]                gh/jansel/566/head      -> origin/gh/jansel/566/head
2025-12-04T10:32:19.3976610Z  * [new branch]                gh/jansel/566/orig      -> origin/gh/jansel/566/orig
2025-12-04T10:32:19.3976677Z  * [new branch]                gh/jansel/567/base      -> origin/gh/jansel/567/base
2025-12-04T10:32:19.3976744Z  * [new branch]                gh/jansel/567/head      -> origin/gh/jansel/567/head
2025-12-04T10:32:19.3976814Z  * [new branch]                gh/jansel/567/orig      -> origin/gh/jansel/567/orig
2025-12-04T10:32:19.3976880Z  * [new branch]                gh/jansel/568/base      -> origin/gh/jansel/568/base
2025-12-04T10:32:19.3976946Z  * [new branch]                gh/jansel/568/head      -> origin/gh/jansel/568/head
2025-12-04T10:32:19.3977015Z  * [new branch]                gh/jansel/568/orig      -> origin/gh/jansel/568/orig
2025-12-04T10:32:19.3977082Z  * [new branch]                gh/jansel/569/base      -> origin/gh/jansel/569/base
2025-12-04T10:32:19.3977151Z  * [new branch]                gh/jansel/569/head      -> origin/gh/jansel/569/head
2025-12-04T10:32:19.3977222Z  * [new branch]                gh/jansel/569/orig      -> origin/gh/jansel/569/orig
2025-12-04T10:32:19.3977288Z  * [new branch]                gh/jansel/570/base      -> origin/gh/jansel/570/base
2025-12-04T10:32:19.3977356Z  * [new branch]                gh/jansel/570/head      -> origin/gh/jansel/570/head
2025-12-04T10:32:19.3977451Z  * [new branch]                gh/jansel/570/orig      -> origin/gh/jansel/570/orig
2025-12-04T10:32:19.3977519Z  * [new branch]                gh/jansel/571/base      -> origin/gh/jansel/571/base
2025-12-04T10:32:19.3977588Z  * [new branch]                gh/jansel/571/head      -> origin/gh/jansel/571/head
2025-12-04T10:32:19.3977654Z  * [new branch]                gh/jansel/571/orig      -> origin/gh/jansel/571/orig
2025-12-04T10:32:19.3977721Z  * [new branch]                gh/jansel/572/base      -> origin/gh/jansel/572/base
2025-12-04T10:32:19.3977790Z  * [new branch]                gh/jansel/572/head      -> origin/gh/jansel/572/head
2025-12-04T10:32:19.3977861Z  * [new branch]                gh/jansel/572/orig      -> origin/gh/jansel/572/orig
2025-12-04T10:32:19.3977928Z  * [new branch]                gh/jansel/573/base      -> origin/gh/jansel/573/base
2025-12-04T10:32:19.3977997Z  * [new branch]                gh/jansel/573/head      -> origin/gh/jansel/573/head
2025-12-04T10:32:19.3978064Z  * [new branch]                gh/jansel/573/orig      -> origin/gh/jansel/573/orig
2025-12-04T10:32:19.3978133Z  * [new branch]                gh/jansel/574/base      -> origin/gh/jansel/574/base
2025-12-04T10:32:19.3978201Z  * [new branch]                gh/jansel/574/head      -> origin/gh/jansel/574/head
2025-12-04T10:32:19.3978267Z  * [new branch]                gh/jansel/574/orig      -> origin/gh/jansel/574/orig
2025-12-04T10:32:19.3978333Z  * [new branch]                gh/jansel/575/base      -> origin/gh/jansel/575/base
2025-12-04T10:32:19.3978401Z  * [new branch]                gh/jansel/575/head      -> origin/gh/jansel/575/head
2025-12-04T10:32:19.3978471Z  * [new branch]                gh/jansel/575/orig      -> origin/gh/jansel/575/orig
2025-12-04T10:32:19.3978538Z  * [new branch]                gh/jansel/576/base      -> origin/gh/jansel/576/base
2025-12-04T10:32:19.3978607Z  * [new branch]                gh/jansel/576/head      -> origin/gh/jansel/576/head
2025-12-04T10:32:19.3978672Z  * [new branch]                gh/jansel/576/orig      -> origin/gh/jansel/576/orig
2025-12-04T10:32:19.3978755Z  * [new branch]                gh/jbschlosser/247/base -> origin/gh/jbschlosser/247/base
2025-12-04T10:32:19.3978837Z  * [new branch]                gh/jbschlosser/247/head -> origin/gh/jbschlosser/247/head
2025-12-04T10:32:19.3978914Z  * [new branch]                gh/jbschlosser/247/orig -> origin/gh/jbschlosser/247/orig
2025-12-04T10:32:19.3978991Z  * [new branch]                gh/jbschlosser/250/base -> origin/gh/jbschlosser/250/base
2025-12-04T10:32:19.3979066Z  * [new branch]                gh/jbschlosser/250/head -> origin/gh/jbschlosser/250/head
2025-12-04T10:32:19.3979370Z  * [new branch]                gh/jbschlosser/250/orig -> origin/gh/jbschlosser/250/orig
2025-12-04T10:32:19.3979444Z  * [new branch]                gh/jerryzh168/1/base    -> origin/gh/jerryzh168/1/base
2025-12-04T10:32:19.3979515Z  * [new branch]                gh/jerryzh168/1/head    -> origin/gh/jerryzh168/1/head
2025-12-04T10:32:19.3979703Z  * [new branch]                gh/jerryzh168/1/orig    -> origin/gh/jerryzh168/1/orig
2025-12-04T10:32:19.3979782Z  * [new branch]                gh/jiayisunx/59/base    -> origin/gh/jiayisunx/59/base
2025-12-04T10:32:19.3979853Z  * [new branch]                gh/jiayisunx/59/head    -> origin/gh/jiayisunx/59/head
2025-12-04T10:32:19.3979925Z  * [new branch]                gh/jiayisunx/59/orig    -> origin/gh/jiayisunx/59/orig
2025-12-04T10:32:19.3979999Z  * [new branch]                gh/jiayisunx/61/base    -> origin/gh/jiayisunx/61/base
2025-12-04T10:32:19.3980070Z  * [new branch]                gh/jiayisunx/61/head    -> origin/gh/jiayisunx/61/head
2025-12-04T10:32:19.3980146Z  * [new branch]                gh/jiayisunx/61/orig    -> origin/gh/jiayisunx/61/orig
2025-12-04T10:32:19.3980218Z  * [new branch]                gh/jiayisunx/68/base    -> origin/gh/jiayisunx/68/base
2025-12-04T10:32:19.3980287Z  * [new branch]                gh/jiayisunx/68/head    -> origin/gh/jiayisunx/68/head
2025-12-04T10:32:19.3980403Z  * [new branch]                gh/jiayisunx/68/orig    -> origin/gh/jiayisunx/68/orig
2025-12-04T10:32:19.3980475Z  * [new branch]                gh/jiayisunx/77/base    -> origin/gh/jiayisunx/77/base
2025-12-04T10:32:19.3980546Z  * [new branch]                gh/jiayisunx/77/head    -> origin/gh/jiayisunx/77/head
2025-12-04T10:32:19.3980619Z  * [new branch]                gh/jiayisunx/77/orig    -> origin/gh/jiayisunx/77/orig
2025-12-04T10:32:19.3980689Z  * [new branch]                gh/jiayisunx/78/base    -> origin/gh/jiayisunx/78/base
2025-12-04T10:32:19.3980758Z  * [new branch]                gh/jiayisunx/78/head    -> origin/gh/jiayisunx/78/head
2025-12-04T10:32:19.3980835Z  * [new branch]                gh/jiayisunx/78/orig    -> origin/gh/jiayisunx/78/orig
2025-12-04T10:32:19.3980906Z  * [new branch]                gh/jiayisunx/79/base    -> origin/gh/jiayisunx/79/base
2025-12-04T10:32:19.3980977Z  * [new branch]                gh/jiayisunx/79/head    -> origin/gh/jiayisunx/79/head
2025-12-04T10:32:19.3981050Z  * [new branch]                gh/jiayisunx/79/orig    -> origin/gh/jiayisunx/79/orig
2025-12-04T10:32:19.3981121Z  * [new branch]                gh/jiayisunx/82/base    -> origin/gh/jiayisunx/82/base
2025-12-04T10:32:19.3981190Z  * [new branch]                gh/jiayisunx/82/head    -> origin/gh/jiayisunx/82/head
2025-12-04T10:32:19.3981260Z  * [new branch]                gh/jiayisunx/82/orig    -> origin/gh/jiayisunx/82/orig
2025-12-04T10:32:19.3981331Z  * [new branch]                gh/jiayisunx/83/base    -> origin/gh/jiayisunx/83/base
2025-12-04T10:32:19.3981400Z  * [new branch]                gh/jiayisunx/83/head    -> origin/gh/jiayisunx/83/head
2025-12-04T10:32:19.3981475Z  * [new branch]                gh/jiayisunx/83/orig    -> origin/gh/jiayisunx/83/orig
2025-12-04T10:32:19.3981547Z  * [new branch]                gh/jiayisunx/84/base    -> origin/gh/jiayisunx/84/base
2025-12-04T10:32:19.3981619Z  * [new branch]                gh/jiayisunx/84/head    -> origin/gh/jiayisunx/84/head
2025-12-04T10:32:19.3981692Z  * [new branch]                gh/jiayisunx/84/orig    -> origin/gh/jiayisunx/84/orig
2025-12-04T10:32:19.3981763Z  * [new branch]                gh/jiayisunx/85/base    -> origin/gh/jiayisunx/85/base
2025-12-04T10:32:19.3981833Z  * [new branch]                gh/jiayisunx/85/head    -> origin/gh/jiayisunx/85/head
2025-12-04T10:32:19.3981905Z  * [new branch]                gh/jiayisunx/85/orig    -> origin/gh/jiayisunx/85/orig
2025-12-04T10:32:19.3981976Z  * [new branch]                gh/jiayisunx/86/base    -> origin/gh/jiayisunx/86/base
2025-12-04T10:32:19.3982050Z  * [new branch]                gh/jiayisunx/86/head    -> origin/gh/jiayisunx/86/head
2025-12-04T10:32:19.3982170Z  * [new branch]                gh/jiayisunx/86/orig    -> origin/gh/jiayisunx/86/orig
2025-12-04T10:32:19.3982241Z  * [new branch]                gh/jiayisunx/87/base    -> origin/gh/jiayisunx/87/base
2025-12-04T10:32:19.3982313Z  * [new branch]                gh/jiayisunx/87/head    -> origin/gh/jiayisunx/87/head
2025-12-04T10:32:19.3982386Z  * [new branch]                gh/jiayisunx/87/orig    -> origin/gh/jiayisunx/87/orig
2025-12-04T10:32:19.3982457Z  * [new branch]                gh/jiayisunx/88/base    -> origin/gh/jiayisunx/88/base
2025-12-04T10:32:19.3982528Z  * [new branch]                gh/jiayisunx/88/head    -> origin/gh/jiayisunx/88/head
2025-12-04T10:32:19.3982598Z  * [new branch]                gh/jiayisunx/88/orig    -> origin/gh/jiayisunx/88/orig
2025-12-04T10:32:19.3982669Z  * [new branch]                gh/jiayisunx/89/base    -> origin/gh/jiayisunx/89/base
2025-12-04T10:32:19.3982741Z  * [new branch]                gh/jiayisunx/89/head    -> origin/gh/jiayisunx/89/head
2025-12-04T10:32:19.3982814Z  * [new branch]                gh/jiayisunx/89/orig    -> origin/gh/jiayisunx/89/orig
2025-12-04T10:32:19.3982885Z  * [new branch]                gh/jiayisunx/90/base    -> origin/gh/jiayisunx/90/base
2025-12-04T10:32:19.3982958Z  * [new branch]                gh/jiayisunx/90/head    -> origin/gh/jiayisunx/90/head
2025-12-04T10:32:19.3983060Z  * [new branch]                gh/jiayisunx/90/orig    -> origin/gh/jiayisunx/90/orig
2025-12-04T10:32:19.3983137Z  * [new branch]                gh/jjwu@meta.com/1/base -> origin/gh/jjwu@meta.com/1/base
2025-12-04T10:32:19.3983214Z  * [new branch]                gh/jjwu@meta.com/1/head -> origin/gh/jjwu@meta.com/1/head
2025-12-04T10:32:19.3983283Z  * [new branch]                gh/jturney/1/base       -> origin/gh/jturney/1/base
2025-12-04T10:32:19.3983352Z  * [new branch]                gh/jturney/1/head       -> origin/gh/jturney/1/head
2025-12-04T10:32:19.3983422Z  * [new branch]                gh/jturney/1/orig       -> origin/gh/jturney/1/orig
2025-12-04T10:32:19.3983487Z  * [new branch]                gh/jturney/2/base       -> origin/gh/jturney/2/base
2025-12-04T10:32:19.3983554Z  * [new branch]                gh/jturney/2/head       -> origin/gh/jturney/2/head
2025-12-04T10:32:19.3983621Z  * [new branch]                gh/jturney/2/orig       -> origin/gh/jturney/2/orig
2025-12-04T10:32:19.3983698Z  * [new branch]                gh/karthickai/10/base   -> origin/gh/karthickai/10/base
2025-12-04T10:32:19.3983774Z  * [new branch]                gh/karthickai/10/head   -> origin/gh/karthickai/10/head
2025-12-04T10:32:19.3983847Z  * [new branch]                gh/karthickai/10/orig   -> origin/gh/karthickai/10/orig
2025-12-04T10:32:19.3983920Z  * [new branch]                gh/karthickai/11/base   -> origin/gh/karthickai/11/base
2025-12-04T10:32:19.3983996Z  * [new branch]                gh/karthickai/11/head   -> origin/gh/karthickai/11/head
2025-12-04T10:32:19.3984070Z  * [new branch]                gh/karthickai/11/orig   -> origin/gh/karthickai/11/orig
2025-12-04T10:32:19.3984141Z  * [new branch]                gh/karthickai/12/base   -> origin/gh/karthickai/12/base
2025-12-04T10:32:19.3984216Z  * [new branch]                gh/karthickai/12/head   -> origin/gh/karthickai/12/head
2025-12-04T10:32:19.3984289Z  * [new branch]                gh/karthickai/12/orig   -> origin/gh/karthickai/12/orig
2025-12-04T10:32:19.3984363Z  * [new branch]                gh/karthickai/13/base   -> origin/gh/karthickai/13/base
2025-12-04T10:32:19.3984438Z  * [new branch]                gh/karthickai/13/head   -> origin/gh/karthickai/13/head
2025-12-04T10:32:19.3984509Z  * [new branch]                gh/karthickai/13/orig   -> origin/gh/karthickai/13/orig
2025-12-04T10:32:19.3984581Z  * [new branch]                gh/karthickai/14/base   -> origin/gh/karthickai/14/base
2025-12-04T10:32:19.3984655Z  * [new branch]                gh/karthickai/14/head   -> origin/gh/karthickai/14/head
2025-12-04T10:32:19.3984758Z  * [new branch]                gh/karthickai/14/orig   -> origin/gh/karthickai/14/orig
2025-12-04T10:32:19.3984831Z  * [new branch]                gh/karthickai/15/base   -> origin/gh/karthickai/15/base
2025-12-04T10:32:19.3984906Z  * [new branch]                gh/karthickai/15/head   -> origin/gh/karthickai/15/head
2025-12-04T10:32:19.3984980Z  * [new branch]                gh/karthickai/15/orig   -> origin/gh/karthickai/15/orig
2025-12-04T10:32:19.3985054Z  * [new branch]                gh/karthickai/16/base   -> origin/gh/karthickai/16/base
2025-12-04T10:32:19.3985128Z  * [new branch]                gh/karthickai/16/head   -> origin/gh/karthickai/16/head
2025-12-04T10:32:19.3985201Z  * [new branch]                gh/karthickai/16/orig   -> origin/gh/karthickai/16/orig
2025-12-04T10:32:19.3985276Z  * [new branch]                gh/karthickai/17/base   -> origin/gh/karthickai/17/base
2025-12-04T10:32:19.3985349Z  * [new branch]                gh/karthickai/17/head   -> origin/gh/karthickai/17/head
2025-12-04T10:32:19.3985423Z  * [new branch]                gh/karthickai/17/orig   -> origin/gh/karthickai/17/orig
2025-12-04T10:32:19.3985497Z  * [new branch]                gh/karthickai/18/base   -> origin/gh/karthickai/18/base
2025-12-04T10:32:19.3985571Z  * [new branch]                gh/karthickai/18/head   -> origin/gh/karthickai/18/head
2025-12-04T10:32:19.3985683Z  * [new branch]                gh/karthickai/18/orig   -> origin/gh/karthickai/18/orig
2025-12-04T10:32:19.3985757Z  * [new branch]                gh/karthickai/19/base   -> origin/gh/karthickai/19/base
2025-12-04T10:32:19.3985829Z  * [new branch]                gh/karthickai/19/head   -> origin/gh/karthickai/19/head
2025-12-04T10:32:19.3985900Z  * [new branch]                gh/karthickai/19/orig   -> origin/gh/karthickai/19/orig
2025-12-04T10:32:19.3985976Z  * [new branch]                gh/karthickai/20/base   -> origin/gh/karthickai/20/base
2025-12-04T10:32:19.3986049Z  * [new branch]                gh/karthickai/20/head   -> origin/gh/karthickai/20/head
2025-12-04T10:32:19.3986122Z  * [new branch]                gh/karthickai/20/orig   -> origin/gh/karthickai/20/orig
2025-12-04T10:32:19.3986196Z  * [new branch]                gh/karthickai/21/base   -> origin/gh/karthickai/21/base
2025-12-04T10:32:19.3986269Z  * [new branch]                gh/karthickai/21/head   -> origin/gh/karthickai/21/head
2025-12-04T10:32:19.3986342Z  * [new branch]                gh/karthickai/21/orig   -> origin/gh/karthickai/21/orig
2025-12-04T10:32:19.3986416Z  * [new branch]                gh/karthickai/22/base   -> origin/gh/karthickai/22/base
2025-12-04T10:32:19.3986488Z  * [new branch]                gh/karthickai/22/head   -> origin/gh/karthickai/22/head
2025-12-04T10:32:19.3986560Z  * [new branch]                gh/karthickai/22/orig   -> origin/gh/karthickai/22/orig
2025-12-04T10:32:19.3986634Z  * [new branch]                gh/karthickai/23/base   -> origin/gh/karthickai/23/base
2025-12-04T10:32:19.3986705Z  * [new branch]                gh/karthickai/23/head   -> origin/gh/karthickai/23/head
2025-12-04T10:32:19.3986778Z  * [new branch]                gh/karthickai/23/orig   -> origin/gh/karthickai/23/orig
2025-12-04T10:32:19.3986852Z  * [new branch]                gh/karthickai/24/base   -> origin/gh/karthickai/24/base
2025-12-04T10:32:19.3986923Z  * [new branch]                gh/karthickai/24/head   -> origin/gh/karthickai/24/head
2025-12-04T10:32:19.3986999Z  * [new branch]                gh/karthickai/24/orig   -> origin/gh/karthickai/24/orig
2025-12-04T10:32:19.3987072Z  * [new branch]                gh/karthickai/25/base   -> origin/gh/karthickai/25/base
2025-12-04T10:32:19.3987144Z  * [new branch]                gh/karthickai/25/head   -> origin/gh/karthickai/25/head
2025-12-04T10:32:19.3987217Z  * [new branch]                gh/karthickai/25/orig   -> origin/gh/karthickai/25/orig
2025-12-04T10:32:19.3987289Z  * [new branch]                gh/karthickai/26/base   -> origin/gh/karthickai/26/base
2025-12-04T10:32:19.3987361Z  * [new branch]                gh/karthickai/26/head   -> origin/gh/karthickai/26/head
2025-12-04T10:32:19.3987464Z  * [new branch]                gh/karthickai/26/orig   -> origin/gh/karthickai/26/orig
2025-12-04T10:32:19.3987537Z  * [new branch]                gh/karthickai/6/base    -> origin/gh/karthickai/6/base
2025-12-04T10:32:19.3987610Z  * [new branch]                gh/karthickai/6/head    -> origin/gh/karthickai/6/head
2025-12-04T10:32:19.3987684Z  * [new branch]                gh/karthickai/6/orig    -> origin/gh/karthickai/6/orig
2025-12-04T10:32:19.3987751Z  * [new branch]                gh/krocki/1/base        -> origin/gh/krocki/1/base
2025-12-04T10:32:19.3987819Z  * [new branch]                gh/krocki/1/head        -> origin/gh/krocki/1/head
2025-12-04T10:32:19.3987887Z  * [new branch]                gh/krocki/1/orig        -> origin/gh/krocki/1/orig
2025-12-04T10:32:19.3987952Z  * [new branch]                gh/krocki/2/base        -> origin/gh/krocki/2/base
2025-12-04T10:32:19.3988016Z  * [new branch]                gh/krocki/2/head        -> origin/gh/krocki/2/head
2025-12-04T10:32:19.3988084Z  * [new branch]                gh/krocki/2/orig        -> origin/gh/krocki/2/orig
2025-12-04T10:32:19.3988164Z  * [new branch]                gh/kurtamohler/60/base  -> origin/gh/kurtamohler/60/base
2025-12-04T10:32:19.3988241Z  * [new branch]                gh/kurtamohler/60/head  -> origin/gh/kurtamohler/60/head
2025-12-04T10:32:19.3988354Z  * [new branch]                gh/kurtamohler/60/orig  -> origin/gh/kurtamohler/60/orig
2025-12-04T10:32:19.3988429Z  * [new branch]                gh/kurtamohler/61/base  -> origin/gh/kurtamohler/61/base
2025-12-04T10:32:19.3988503Z  * [new branch]                gh/kurtamohler/61/head  -> origin/gh/kurtamohler/61/head
2025-12-04T10:32:19.3988578Z  * [new branch]                gh/kurtamohler/61/orig  -> origin/gh/kurtamohler/61/orig
2025-12-04T10:32:19.3988652Z  * [new branch]                gh/kurtamohler/62/base  -> origin/gh/kurtamohler/62/base
2025-12-04T10:32:19.3988733Z  * [new branch]                gh/kurtamohler/62/head  -> origin/gh/kurtamohler/62/head
2025-12-04T10:32:19.3988805Z  * [new branch]                gh/kurtamohler/62/orig  -> origin/gh/kurtamohler/62/orig
2025-12-04T10:32:19.3988878Z  * [new branch]                gh/kurtamohler/63/base  -> origin/gh/kurtamohler/63/base
2025-12-04T10:32:19.3988954Z  * [new branch]                gh/kurtamohler/63/head  -> origin/gh/kurtamohler/63/head
2025-12-04T10:32:19.3989030Z  * [new branch]                gh/kurtamohler/63/orig  -> origin/gh/kurtamohler/63/orig
2025-12-04T10:32:19.3989105Z  * [new branch]                gh/kurtamohler/64/base  -> origin/gh/kurtamohler/64/base
2025-12-04T10:32:19.3989180Z  * [new branch]                gh/kurtamohler/64/head  -> origin/gh/kurtamohler/64/head
2025-12-04T10:32:19.3989255Z  * [new branch]                gh/kurtamohler/64/orig  -> origin/gh/kurtamohler/64/orig
2025-12-04T10:32:19.3989329Z  * [new branch]                gh/kurtamohler/65/base  -> origin/gh/kurtamohler/65/base
2025-12-04T10:32:19.3989407Z  * [new branch]                gh/kurtamohler/65/head  -> origin/gh/kurtamohler/65/head
2025-12-04T10:32:19.3989481Z  * [new branch]                gh/kurtamohler/65/orig  -> origin/gh/kurtamohler/65/orig
2025-12-04T10:32:19.3989554Z  * [new branch]                gh/kurtamohler/66/base  -> origin/gh/kurtamohler/66/base
2025-12-04T10:32:19.3989671Z  * [new branch]                gh/kurtamohler/66/head  -> origin/gh/kurtamohler/66/head
2025-12-04T10:32:19.3989747Z  * [new branch]                gh/kurtamohler/66/orig  -> origin/gh/kurtamohler/66/orig
2025-12-04T10:32:19.3989820Z  * [new branch]                gh/kurtamohler/67/base  -> origin/gh/kurtamohler/67/base
2025-12-04T10:32:19.3989895Z  * [new branch]                gh/kurtamohler/67/head  -> origin/gh/kurtamohler/67/head
2025-12-04T10:32:19.3989968Z  * [new branch]                gh/kurtamohler/67/orig  -> origin/gh/kurtamohler/67/orig
2025-12-04T10:32:19.3990042Z  * [new branch]                gh/kwen2501/130/base    -> origin/gh/kwen2501/130/base
2025-12-04T10:32:19.3990164Z  * [new branch]                gh/kwen2501/130/head    -> origin/gh/kwen2501/130/head
2025-12-04T10:32:19.3990234Z  * [new branch]                gh/kwen2501/130/orig    -> origin/gh/kwen2501/130/orig
2025-12-04T10:32:19.3990306Z  * [new branch]                gh/kwen2501/170/base    -> origin/gh/kwen2501/170/base
2025-12-04T10:32:19.3990378Z  * [new branch]                gh/kwen2501/170/head    -> origin/gh/kwen2501/170/head
2025-12-04T10:32:19.3990448Z  * [new branch]                gh/kwen2501/187/base    -> origin/gh/kwen2501/187/base
2025-12-04T10:32:19.3990520Z  * [new branch]                gh/kwen2501/187/head    -> origin/gh/kwen2501/187/head
2025-12-04T10:32:19.3990590Z  * [new branch]                gh/kwen2501/187/orig    -> origin/gh/kwen2501/187/orig
2025-12-04T10:32:19.3990660Z  * [new branch]                gh/kwen2501/188/base    -> origin/gh/kwen2501/188/base
2025-12-04T10:32:19.3990732Z  * [new branch]                gh/kwen2501/188/head    -> origin/gh/kwen2501/188/head
2025-12-04T10:32:19.3990803Z  * [new branch]                gh/kwen2501/188/orig    -> origin/gh/kwen2501/188/orig
2025-12-04T10:32:19.3990872Z  * [new branch]                gh/kwen2501/211/base    -> origin/gh/kwen2501/211/base
2025-12-04T10:32:19.3990943Z  * [new branch]                gh/kwen2501/211/head    -> origin/gh/kwen2501/211/head
2025-12-04T10:32:19.3991059Z  * [new branch]                gh/kwen2501/224/base    -> origin/gh/kwen2501/224/base
2025-12-04T10:32:19.3991128Z  * [new branch]                gh/kwen2501/224/head    -> origin/gh/kwen2501/224/head
2025-12-04T10:32:19.3991200Z  * [new branch]                gh/kwen2501/224/orig    -> origin/gh/kwen2501/224/orig
2025-12-04T10:32:19.3991269Z  * [new branch]                gh/kwen2501/228/base    -> origin/gh/kwen2501/228/base
2025-12-04T10:32:19.3991338Z  * [new branch]                gh/kwen2501/228/head    -> origin/gh/kwen2501/228/head
2025-12-04T10:32:19.3991410Z  * [new branch]                gh/kwen2501/228/orig    -> origin/gh/kwen2501/228/orig
2025-12-04T10:32:19.3991480Z  * [new branch]                gh/kwen2501/234/base    -> origin/gh/kwen2501/234/base
2025-12-04T10:32:19.3991550Z  * [new branch]                gh/kwen2501/234/head    -> origin/gh/kwen2501/234/head
2025-12-04T10:32:19.3991620Z  * [new branch]                gh/kwen2501/234/orig    -> origin/gh/kwen2501/234/orig
2025-12-04T10:32:19.3991690Z  * [new branch]                gh/kwen2501/235/base    -> origin/gh/kwen2501/235/base
2025-12-04T10:32:19.3991760Z  * [new branch]                gh/kwen2501/235/head    -> origin/gh/kwen2501/235/head
2025-12-04T10:32:19.3991828Z  * [new branch]                gh/kwen2501/235/orig    -> origin/gh/kwen2501/235/orig
2025-12-04T10:32:19.3991897Z  * [new branch]                gh/kwen2501/236/base    -> origin/gh/kwen2501/236/base
2025-12-04T10:32:19.3991968Z  * [new branch]                gh/kwen2501/236/head    -> origin/gh/kwen2501/236/head
2025-12-04T10:32:19.3992036Z  * [new branch]                gh/kwen2501/236/orig    -> origin/gh/kwen2501/236/orig
2025-12-04T10:32:19.3992107Z  * [new branch]                gh/kwen2501/237/base    -> origin/gh/kwen2501/237/base
2025-12-04T10:32:19.3992179Z  * [new branch]                gh/kwen2501/237/head    -> origin/gh/kwen2501/237/head
2025-12-04T10:32:19.3992247Z  * [new branch]                gh/kwen2501/237/orig    -> origin/gh/kwen2501/237/orig
2025-12-04T10:32:19.3992317Z  * [new branch]                gh/kwen2501/238/base    -> origin/gh/kwen2501/238/base
2025-12-04T10:32:19.3992387Z  * [new branch]                gh/kwen2501/238/head    -> origin/gh/kwen2501/238/head
2025-12-04T10:32:19.3992455Z  * [new branch]                gh/kwen2501/238/orig    -> origin/gh/kwen2501/238/orig
2025-12-04T10:32:19.3992524Z  * [new branch]                gh/kwen2501/240/base    -> origin/gh/kwen2501/240/base
2025-12-04T10:32:19.3992594Z  * [new branch]                gh/kwen2501/240/head    -> origin/gh/kwen2501/240/head
2025-12-04T10:32:19.3992663Z  * [new branch]                gh/kwen2501/240/orig    -> origin/gh/kwen2501/240/orig
2025-12-04T10:32:19.3992757Z  * [new branch]                gh/kwen2501/241/base    -> origin/gh/kwen2501/241/base
2025-12-04T10:32:19.3992828Z  * [new branch]                gh/kwen2501/241/head    -> origin/gh/kwen2501/241/head
2025-12-04T10:32:19.3992897Z  * [new branch]                gh/kwen2501/241/orig    -> origin/gh/kwen2501/241/orig
2025-12-04T10:32:19.3992968Z  * [new branch]                gh/kwen2501/247/base    -> origin/gh/kwen2501/247/base
2025-12-04T10:32:19.3993039Z  * [new branch]                gh/kwen2501/247/head    -> origin/gh/kwen2501/247/head
2025-12-04T10:32:19.3993108Z  * [new branch]                gh/kwen2501/247/orig    -> origin/gh/kwen2501/247/orig
2025-12-04T10:32:19.3993180Z  * [new branch]                gh/kwen2501/252/base    -> origin/gh/kwen2501/252/base
2025-12-04T10:32:19.3993249Z  * [new branch]                gh/kwen2501/252/head    -> origin/gh/kwen2501/252/head
2025-12-04T10:32:19.3993320Z  * [new branch]                gh/kwen2501/252/orig    -> origin/gh/kwen2501/252/orig
2025-12-04T10:32:19.3993391Z  * [new branch]                gh/kwen2501/259/base    -> origin/gh/kwen2501/259/base
2025-12-04T10:32:19.3993460Z  * [new branch]                gh/kwen2501/259/head    -> origin/gh/kwen2501/259/head
2025-12-04T10:32:19.3993529Z  * [new branch]                gh/kwen2501/259/orig    -> origin/gh/kwen2501/259/orig
2025-12-04T10:32:19.3993628Z  * [new branch]                gh/kwen2501/260/base    -> origin/gh/kwen2501/260/base
2025-12-04T10:32:19.3993696Z  * [new branch]                gh/kwen2501/260/head    -> origin/gh/kwen2501/260/head
2025-12-04T10:32:19.3993765Z  * [new branch]                gh/kwen2501/260/orig    -> origin/gh/kwen2501/260/orig
2025-12-04T10:32:19.3993836Z  * [new branch]                gh/kwen2501/268/base    -> origin/gh/kwen2501/268/base
2025-12-04T10:32:19.3993905Z  * [new branch]                gh/kwen2501/268/head    -> origin/gh/kwen2501/268/head
2025-12-04T10:32:19.3993974Z  * [new branch]                gh/kwen2501/268/orig    -> origin/gh/kwen2501/268/orig
2025-12-04T10:32:19.3994044Z  * [new branch]                gh/kwen2501/269/base    -> origin/gh/kwen2501/269/base
2025-12-04T10:32:19.3994113Z  * [new branch]                gh/kwen2501/269/head    -> origin/gh/kwen2501/269/head
2025-12-04T10:32:19.3994182Z  * [new branch]                gh/kwen2501/269/orig    -> origin/gh/kwen2501/269/orig
2025-12-04T10:32:19.3994253Z  * [new branch]                gh/kwen2501/270/base    -> origin/gh/kwen2501/270/base
2025-12-04T10:32:19.3994322Z  * [new branch]                gh/kwen2501/270/head    -> origin/gh/kwen2501/270/head
2025-12-04T10:32:19.3994391Z  * [new branch]                gh/kwen2501/270/orig    -> origin/gh/kwen2501/270/orig
2025-12-04T10:32:19.3994464Z  * [new branch]                gh/kwen2501/271/base    -> origin/gh/kwen2501/271/base
2025-12-04T10:32:19.3994534Z  * [new branch]                gh/kwen2501/271/head    -> origin/gh/kwen2501/271/head
2025-12-04T10:32:19.3994605Z  * [new branch]                gh/kwen2501/271/orig    -> origin/gh/kwen2501/271/orig
2025-12-04T10:32:19.3994675Z  * [new branch]                gh/kwen2501/274/base    -> origin/gh/kwen2501/274/base
2025-12-04T10:32:19.3994744Z  * [new branch]                gh/kwen2501/274/head    -> origin/gh/kwen2501/274/head
2025-12-04T10:32:19.3994815Z  * [new branch]                gh/kwen2501/274/orig    -> origin/gh/kwen2501/274/orig
2025-12-04T10:32:19.3994886Z  * [new branch]                gh/kwen2501/275/base    -> origin/gh/kwen2501/275/base
2025-12-04T10:32:19.3994954Z  * [new branch]                gh/kwen2501/275/head    -> origin/gh/kwen2501/275/head
2025-12-04T10:32:19.3995024Z  * [new branch]                gh/kwen2501/275/orig    -> origin/gh/kwen2501/275/orig
2025-12-04T10:32:19.3995093Z  * [new branch]                gh/kwen2501/276/base    -> origin/gh/kwen2501/276/base
2025-12-04T10:32:19.3995162Z  * [new branch]                gh/kwen2501/276/head    -> origin/gh/kwen2501/276/head
2025-12-04T10:32:19.3995262Z  * [new branch]                gh/kwen2501/276/orig    -> origin/gh/kwen2501/276/orig
2025-12-04T10:32:19.3995331Z  * [new branch]                gh/kwen2501/277/base    -> origin/gh/kwen2501/277/base
2025-12-04T10:32:19.3995400Z  * [new branch]                gh/kwen2501/277/head    -> origin/gh/kwen2501/277/head
2025-12-04T10:32:19.3995471Z  * [new branch]                gh/kwen2501/277/orig    -> origin/gh/kwen2501/277/orig
2025-12-04T10:32:19.3995542Z  * [new branch]                gh/kwen2501/278/base    -> origin/gh/kwen2501/278/base
2025-12-04T10:32:19.3995611Z  * [new branch]                gh/kwen2501/278/head    -> origin/gh/kwen2501/278/head
2025-12-04T10:32:19.3995682Z  * [new branch]                gh/kwen2501/278/orig    -> origin/gh/kwen2501/278/orig
2025-12-04T10:32:19.3995751Z  * [new branch]                gh/kwen2501/279/base    -> origin/gh/kwen2501/279/base
2025-12-04T10:32:19.3995819Z  * [new branch]                gh/kwen2501/279/head    -> origin/gh/kwen2501/279/head
2025-12-04T10:32:19.3995892Z  * [new branch]                gh/kwen2501/279/orig    -> origin/gh/kwen2501/279/orig
2025-12-04T10:32:19.3995962Z  * [new branch]                gh/kwen2501/280/base    -> origin/gh/kwen2501/280/base
2025-12-04T10:32:19.3996031Z  * [new branch]                gh/kwen2501/280/head    -> origin/gh/kwen2501/280/head
2025-12-04T10:32:19.3996137Z  * [new branch]                gh/kwen2501/280/orig    -> origin/gh/kwen2501/280/orig
2025-12-04T10:32:19.3996205Z  * [new branch]                gh/kwen2501/281/base    -> origin/gh/kwen2501/281/base
2025-12-04T10:32:19.3996275Z  * [new branch]                gh/kwen2501/281/head    -> origin/gh/kwen2501/281/head
2025-12-04T10:32:19.3996343Z  * [new branch]                gh/kwen2501/281/orig    -> origin/gh/kwen2501/281/orig
2025-12-04T10:32:19.3996412Z  * [new branch]                gh/kwen2501/282/base    -> origin/gh/kwen2501/282/base
2025-12-04T10:32:19.3996483Z  * [new branch]                gh/kwen2501/282/head    -> origin/gh/kwen2501/282/head
2025-12-04T10:32:19.3996553Z  * [new branch]                gh/kwen2501/282/orig    -> origin/gh/kwen2501/282/orig
2025-12-04T10:32:19.3996622Z  * [new branch]                gh/kwen2501/283/base    -> origin/gh/kwen2501/283/base
2025-12-04T10:32:19.3996694Z  * [new branch]                gh/kwen2501/283/head    -> origin/gh/kwen2501/283/head
2025-12-04T10:32:19.3996765Z  * [new branch]                gh/kwen2501/283/orig    -> origin/gh/kwen2501/283/orig
2025-12-04T10:32:19.3996834Z  * [new branch]                gh/kwen2501/284/base    -> origin/gh/kwen2501/284/base
2025-12-04T10:32:19.3996905Z  * [new branch]                gh/kwen2501/284/head    -> origin/gh/kwen2501/284/head
2025-12-04T10:32:19.3996975Z  * [new branch]                gh/kwen2501/284/orig    -> origin/gh/kwen2501/284/orig
2025-12-04T10:32:19.3997044Z  * [new branch]                gh/kwen2501/285/base    -> origin/gh/kwen2501/285/base
2025-12-04T10:32:19.3997114Z  * [new branch]                gh/kwen2501/285/head    -> origin/gh/kwen2501/285/head
2025-12-04T10:32:19.3997184Z  * [new branch]                gh/kwen2501/285/orig    -> origin/gh/kwen2501/285/orig
2025-12-04T10:32:19.3997254Z  * [new branch]                gh/kwen2501/286/base    -> origin/gh/kwen2501/286/base
2025-12-04T10:32:19.3997329Z  * [new branch]                gh/kwen2501/286/head    -> origin/gh/kwen2501/286/head
2025-12-04T10:32:19.3997400Z  * [new branch]                gh/kwen2501/286/orig    -> origin/gh/kwen2501/286/orig
2025-12-04T10:32:19.3997469Z  * [new branch]                gh/kwen2501/287/base    -> origin/gh/kwen2501/287/base
2025-12-04T10:32:19.3997542Z  * [new branch]                gh/kwen2501/287/head    -> origin/gh/kwen2501/287/head
2025-12-04T10:32:19.3997611Z  * [new branch]                gh/kwen2501/287/orig    -> origin/gh/kwen2501/287/orig
2025-12-04T10:32:19.3997683Z  * [new branch]                gh/kwen2501/288/base    -> origin/gh/kwen2501/288/base
2025-12-04T10:32:19.3997752Z  * [new branch]                gh/kwen2501/288/head    -> origin/gh/kwen2501/288/head
2025-12-04T10:32:19.3997848Z  * [new branch]                gh/kwen2501/288/orig    -> origin/gh/kwen2501/288/orig
2025-12-04T10:32:19.3997927Z  * [new branch]                gh/laithsakka/251/base  -> origin/gh/laithsakka/251/base
2025-12-04T10:32:19.3998003Z  * [new branch]                gh/laithsakka/251/head  -> origin/gh/laithsakka/251/head
2025-12-04T10:32:19.3998079Z  * [new branch]                gh/laithsakka/251/orig  -> origin/gh/laithsakka/251/orig
2025-12-04T10:32:19.3998155Z  * [new branch]                gh/laithsakka/276/base  -> origin/gh/laithsakka/276/base
2025-12-04T10:32:19.3998227Z  * [new branch]                gh/laithsakka/276/head  -> origin/gh/laithsakka/276/head
2025-12-04T10:32:19.3998300Z  * [new branch]                gh/laithsakka/276/orig  -> origin/gh/laithsakka/276/orig
2025-12-04T10:32:19.3998375Z  * [new branch]                gh/laithsakka/28/base   -> origin/gh/laithsakka/28/base
2025-12-04T10:32:19.3998448Z  * [new branch]                gh/laithsakka/29/base   -> origin/gh/laithsakka/29/base
2025-12-04T10:32:19.3998524Z  * [new branch]                gh/laithsakka/30/base   -> origin/gh/laithsakka/30/base
2025-12-04T10:32:19.3998600Z  * [new branch]                gh/laithsakka/30/head   -> origin/gh/laithsakka/30/head
2025-12-04T10:32:19.3998671Z  * [new branch]                gh/laithsakka/31/base   -> origin/gh/laithsakka/31/base
2025-12-04T10:32:19.3998772Z  * [new branch]                gh/laithsakka/31/head   -> origin/gh/laithsakka/31/head
2025-12-04T10:32:19.3998849Z  * [new branch]                gh/laithsakka/313/base  -> origin/gh/laithsakka/313/base
2025-12-04T10:32:19.3998925Z  * [new branch]                gh/laithsakka/313/head  -> origin/gh/laithsakka/313/head
2025-12-04T10:32:19.3998999Z  * [new branch]                gh/laithsakka/313/orig  -> origin/gh/laithsakka/313/orig
2025-12-04T10:32:19.3999074Z  * [new branch]                gh/laithsakka/316/base  -> origin/gh/laithsakka/316/base
2025-12-04T10:32:19.3999149Z  * [new branch]                gh/laithsakka/316/head  -> origin/gh/laithsakka/316/head
2025-12-04T10:32:19.3999223Z  * [new branch]                gh/laithsakka/316/orig  -> origin/gh/laithsakka/316/orig
2025-12-04T10:32:19.3999296Z  * [new branch]                gh/laithsakka/317/base  -> origin/gh/laithsakka/317/base
2025-12-04T10:32:19.3999370Z  * [new branch]                gh/laithsakka/317/head  -> origin/gh/laithsakka/317/head
2025-12-04T10:32:19.3999447Z  * [new branch]                gh/laithsakka/317/orig  -> origin/gh/laithsakka/317/orig
2025-12-04T10:32:19.3999520Z  * [new branch]                gh/laithsakka/319/base  -> origin/gh/laithsakka/319/base
2025-12-04T10:32:19.3999623Z  * [new branch]                gh/laithsakka/319/head  -> origin/gh/laithsakka/319/head
2025-12-04T10:32:19.3999698Z  * [new branch]                gh/laithsakka/319/orig  -> origin/gh/laithsakka/319/orig
2025-12-04T10:32:19.3999771Z  * [new branch]                gh/laithsakka/32/base   -> origin/gh/laithsakka/32/base
2025-12-04T10:32:19.3999847Z  * [new branch]                gh/laithsakka/32/head   -> origin/gh/laithsakka/32/head
2025-12-04T10:32:19.3999922Z  * [new branch]                gh/laithsakka/320/base  -> origin/gh/laithsakka/320/base
2025-12-04T10:32:19.3999994Z  * [new branch]                gh/laithsakka/320/head  -> origin/gh/laithsakka/320/head
2025-12-04T10:32:19.4000069Z  * [new branch]                gh/laithsakka/320/orig  -> origin/gh/laithsakka/320/orig
2025-12-04T10:32:19.4000144Z  * [new branch]                gh/laithsakka/321/base  -> origin/gh/laithsakka/321/base
2025-12-04T10:32:19.4000217Z  * [new branch]                gh/laithsakka/321/head  -> origin/gh/laithsakka/321/head
2025-12-04T10:32:19.4000290Z  * [new branch]                gh/laithsakka/321/orig  -> origin/gh/laithsakka/321/orig
2025-12-04T10:32:19.4000366Z  * [new branch]                gh/laithsakka/322/base  -> origin/gh/laithsakka/322/base
2025-12-04T10:32:19.4000440Z  * [new branch]                gh/laithsakka/322/head  -> origin/gh/laithsakka/322/head
2025-12-04T10:32:19.4000555Z  * [new branch]                gh/laithsakka/322/orig  -> origin/gh/laithsakka/322/orig
2025-12-04T10:32:19.4000631Z  * [new branch]                gh/laithsakka/323/base  -> origin/gh/laithsakka/323/base
2025-12-04T10:32:19.4000704Z  * [new branch]                gh/laithsakka/323/head  -> origin/gh/laithsakka/323/head
2025-12-04T10:32:19.4000780Z  * [new branch]                gh/laithsakka/323/orig  -> origin/gh/laithsakka/323/orig
2025-12-04T10:32:19.4000854Z  * [new branch]                gh/laithsakka/324/base  -> origin/gh/laithsakka/324/base
2025-12-04T10:32:19.4000927Z  * [new branch]                gh/laithsakka/324/head  -> origin/gh/laithsakka/324/head
2025-12-04T10:32:19.4001002Z  * [new branch]                gh/laithsakka/324/orig  -> origin/gh/laithsakka/324/orig
2025-12-04T10:32:19.4001073Z  * [new branch]                gh/laithsakka/325/base  -> origin/gh/laithsakka/325/base
2025-12-04T10:32:19.4001145Z  * [new branch]                gh/laithsakka/325/head  -> origin/gh/laithsakka/325/head
2025-12-04T10:32:19.4001221Z  * [new branch]                gh/laithsakka/325/orig  -> origin/gh/laithsakka/325/orig
2025-12-04T10:32:19.4001294Z  * [new branch]                gh/laithsakka/326/base  -> origin/gh/laithsakka/326/base
2025-12-04T10:32:19.4001367Z  * [new branch]                gh/laithsakka/326/head  -> origin/gh/laithsakka/326/head
2025-12-04T10:32:19.4001485Z  * [new branch]                gh/laithsakka/326/orig  -> origin/gh/laithsakka/326/orig
2025-12-04T10:32:19.4001559Z  * [new branch]                gh/laithsakka/327/base  -> origin/gh/laithsakka/327/base
2025-12-04T10:32:19.4001630Z  * [new branch]                gh/laithsakka/327/head  -> origin/gh/laithsakka/327/head
2025-12-04T10:32:19.4001705Z  * [new branch]                gh/laithsakka/327/orig  -> origin/gh/laithsakka/327/orig
2025-12-04T10:32:19.4001779Z  * [new branch]                gh/laithsakka/328/base  -> origin/gh/laithsakka/328/base
2025-12-04T10:32:19.4001857Z  * [new branch]                gh/laithsakka/328/head  -> origin/gh/laithsakka/328/head
2025-12-04T10:32:19.4001933Z  * [new branch]                gh/laithsakka/328/orig  -> origin/gh/laithsakka/328/orig
2025-12-04T10:32:19.4002003Z  * [new branch]                gh/liangel/4/base       -> origin/gh/liangel/4/base
2025-12-04T10:32:19.4002072Z  * [new branch]                gh/liangel/4/head       -> origin/gh/liangel/4/head
2025-12-04T10:32:19.4002141Z  * [new branch]                gh/liangel/4/orig       -> origin/gh/liangel/4/orig
2025-12-04T10:32:19.4002217Z  * [new branch]                gh/lucaskabela/1/base   -> origin/gh/lucaskabela/1/base
2025-12-04T10:32:19.4002291Z  * [new branch]                gh/lucaskabela/1/head   -> origin/gh/lucaskabela/1/head
2025-12-04T10:32:19.4002358Z  * [new branch]                gh/lw/4/base            -> origin/gh/lw/4/base
2025-12-04T10:32:19.4002421Z  * [new branch]                gh/lw/4/head            -> origin/gh/lw/4/head
2025-12-04T10:32:19.4002487Z  * [new branch]                gh/lw/4/orig            -> origin/gh/lw/4/orig
2025-12-04T10:32:19.4002548Z  * [new branch]                gh/lw/5/base            -> origin/gh/lw/5/base
2025-12-04T10:32:19.4002609Z  * [new branch]                gh/lw/5/head            -> origin/gh/lw/5/head
2025-12-04T10:32:19.4002669Z  * [new branch]                gh/lw/5/orig            -> origin/gh/lw/5/orig
2025-12-04T10:32:19.4002731Z  * [new branch]                gh/lw/6/base            -> origin/gh/lw/6/base
2025-12-04T10:32:19.4002791Z  * [new branch]                gh/lw/6/head            -> origin/gh/lw/6/head
2025-12-04T10:32:19.4002856Z  * [new branch]                gh/lw/6/orig            -> origin/gh/lw/6/orig
2025-12-04T10:32:19.4002923Z  * [new branch]                gh/malfet/14/base       -> origin/gh/malfet/14/base
2025-12-04T10:32:19.4002995Z  * [new branch]                gh/malfet/417/base      -> origin/gh/malfet/417/base
2025-12-04T10:32:19.4003066Z  * [new branch]                gh/malfet/417/head      -> origin/gh/malfet/417/head
2025-12-04T10:32:19.4003164Z  * [new branch]                gh/malfet/417/orig      -> origin/gh/malfet/417/orig
2025-12-04T10:32:19.4003233Z  * [new branch]                gh/malfet/506/base      -> origin/gh/malfet/506/base
2025-12-04T10:32:19.4003302Z  * [new branch]                gh/malfet/506/head      -> origin/gh/malfet/506/head
2025-12-04T10:32:19.4003371Z  * [new branch]                gh/malfet/506/orig      -> origin/gh/malfet/506/orig
2025-12-04T10:32:19.4003438Z  * [new branch]                gh/malfet/517/base      -> origin/gh/malfet/517/base
2025-12-04T10:32:19.4003505Z  * [new branch]                gh/malfet/517/head      -> origin/gh/malfet/517/head
2025-12-04T10:32:19.4003572Z  * [new branch]                gh/malfet/528/base      -> origin/gh/malfet/528/base
2025-12-04T10:32:19.4003638Z  * [new branch]                gh/malfet/528/head      -> origin/gh/malfet/528/head
2025-12-04T10:32:19.4003706Z  * [new branch]                gh/malfet/528/orig      -> origin/gh/malfet/528/orig
2025-12-04T10:32:19.4003774Z  * [new branch]                gh/malfet/537/base      -> origin/gh/malfet/537/base
2025-12-04T10:32:19.4003841Z  * [new branch]                gh/malfet/537/head      -> origin/gh/malfet/537/head
2025-12-04T10:32:19.4003911Z  * [new branch]                gh/malfet/537/orig      -> origin/gh/malfet/537/orig
2025-12-04T10:32:19.4004009Z  * [new branch]                gh/malfet/546/base      -> origin/gh/malfet/546/base
2025-12-04T10:32:19.4004077Z  * [new branch]                gh/malfet/546/head      -> origin/gh/malfet/546/head
2025-12-04T10:32:19.4004145Z  * [new branch]                gh/malfet/546/orig      -> origin/gh/malfet/546/orig
2025-12-04T10:32:19.4004212Z  * [new branch]                gh/malfet/565/base      -> origin/gh/malfet/565/base
2025-12-04T10:32:19.4004280Z  * [new branch]                gh/malfet/565/head      -> origin/gh/malfet/565/head
2025-12-04T10:32:19.4004345Z  * [new branch]                gh/malfet/565/orig      -> origin/gh/malfet/565/orig
2025-12-04T10:32:19.4004414Z  * [new branch]                gh/malfet/575/base      -> origin/gh/malfet/575/base
2025-12-04T10:32:19.4004484Z  * [new branch]                gh/malfet/575/head      -> origin/gh/malfet/575/head
2025-12-04T10:32:19.4004550Z  * [new branch]                gh/malfet/575/orig      -> origin/gh/malfet/575/orig
2025-12-04T10:32:19.4004619Z  * [new branch]                gh/malfet/580/base      -> origin/gh/malfet/580/base
2025-12-04T10:32:19.4004687Z  * [new branch]                gh/malfet/580/head      -> origin/gh/malfet/580/head
2025-12-04T10:32:19.4004753Z  * [new branch]                gh/malfet/580/orig      -> origin/gh/malfet/580/orig
2025-12-04T10:32:19.4004820Z  * [new branch]                gh/malfet/581/base      -> origin/gh/malfet/581/base
2025-12-04T10:32:19.4004888Z  * [new branch]                gh/malfet/581/head      -> origin/gh/malfet/581/head
2025-12-04T10:32:19.4004954Z  * [new branch]                gh/malfet/581/orig      -> origin/gh/malfet/581/orig
2025-12-04T10:32:19.4005022Z  * [new branch]                gh/malfet/583/base      -> origin/gh/malfet/583/base
2025-12-04T10:32:19.4005093Z  * [new branch]                gh/malfet/583/head      -> origin/gh/malfet/583/head
2025-12-04T10:32:19.4005159Z  * [new branch]                gh/malfet/583/orig      -> origin/gh/malfet/583/orig
2025-12-04T10:32:19.4005225Z  * [new branch]                gh/malfet/586/base      -> origin/gh/malfet/586/base
2025-12-04T10:32:19.4005295Z  * [new branch]                gh/malfet/586/head      -> origin/gh/malfet/586/head
2025-12-04T10:32:19.4005362Z  * [new branch]                gh/malfet/586/orig      -> origin/gh/malfet/586/orig
2025-12-04T10:32:19.4005429Z  * [new branch]                gh/malfet/587/base      -> origin/gh/malfet/587/base
2025-12-04T10:32:19.4005497Z  * [new branch]                gh/malfet/587/head      -> origin/gh/malfet/587/head
2025-12-04T10:32:19.4005563Z  * [new branch]                gh/malfet/587/orig      -> origin/gh/malfet/587/orig
2025-12-04T10:32:19.4005668Z  * [new branch]                gh/malfet/588/base      -> origin/gh/malfet/588/base
2025-12-04T10:32:19.4005736Z  * [new branch]                gh/malfet/588/head      -> origin/gh/malfet/588/head
2025-12-04T10:32:19.4005801Z  * [new branch]                gh/malfet/588/orig      -> origin/gh/malfet/588/orig
2025-12-04T10:32:19.4005869Z  * [new branch]                gh/malfet/589/base      -> origin/gh/malfet/589/base
2025-12-04T10:32:19.4005935Z  * [new branch]                gh/malfet/589/head      -> origin/gh/malfet/589/head
2025-12-04T10:32:19.4006003Z  * [new branch]                gh/malfet/589/orig      -> origin/gh/malfet/589/orig
2025-12-04T10:32:19.4006070Z  * [new branch]                gh/malfet/590/base      -> origin/gh/malfet/590/base
2025-12-04T10:32:19.4006137Z  * [new branch]                gh/malfet/590/head      -> origin/gh/malfet/590/head
2025-12-04T10:32:19.4006205Z  * [new branch]                gh/malfet/590/orig      -> origin/gh/malfet/590/orig
2025-12-04T10:32:19.4006276Z  * [new branch]                gh/malfet/591/base      -> origin/gh/malfet/591/base
2025-12-04T10:32:19.4006341Z  * [new branch]                gh/malfet/591/head      -> origin/gh/malfet/591/head
2025-12-04T10:32:19.4006406Z  * [new branch]                gh/malfet/591/orig      -> origin/gh/malfet/591/orig
2025-12-04T10:32:19.4006475Z  * [new branch]                gh/malfet/592/base      -> origin/gh/malfet/592/base
2025-12-04T10:32:19.4006567Z  * [new branch]                gh/malfet/592/head      -> origin/gh/malfet/592/head
2025-12-04T10:32:19.4006635Z  * [new branch]                gh/malfet/592/orig      -> origin/gh/malfet/592/orig
2025-12-04T10:32:19.4006704Z  * [new branch]                gh/malfet/593/base      -> origin/gh/malfet/593/base
2025-12-04T10:32:19.4006771Z  * [new branch]                gh/malfet/593/head      -> origin/gh/malfet/593/head
2025-12-04T10:32:19.4006839Z  * [new branch]                gh/malfet/593/orig      -> origin/gh/malfet/593/orig
2025-12-04T10:32:19.4006910Z  * [new branch]                gh/malfet/594/base      -> origin/gh/malfet/594/base
2025-12-04T10:32:19.4006976Z  * [new branch]                gh/malfet/594/head      -> origin/gh/malfet/594/head
2025-12-04T10:32:19.4007043Z  * [new branch]                gh/malfet/594/orig      -> origin/gh/malfet/594/orig
2025-12-04T10:32:19.4007111Z  * [new branch]                gh/malfet/595/base      -> origin/gh/malfet/595/base
2025-12-04T10:32:19.4007179Z  * [new branch]                gh/malfet/595/head      -> origin/gh/malfet/595/head
2025-12-04T10:32:19.4007246Z  * [new branch]                gh/malfet/595/orig      -> origin/gh/malfet/595/orig
2025-12-04T10:32:19.4007317Z  * [new branch]                gh/malfet/596/base      -> origin/gh/malfet/596/base
2025-12-04T10:32:19.4007384Z  * [new branch]                gh/malfet/596/head      -> origin/gh/malfet/596/head
2025-12-04T10:32:19.4007453Z  * [new branch]                gh/malfet/596/orig      -> origin/gh/malfet/596/orig
2025-12-04T10:32:19.4007521Z  * [new branch]                gh/malfet/597/base      -> origin/gh/malfet/597/base
2025-12-04T10:32:19.4007587Z  * [new branch]                gh/malfet/597/head      -> origin/gh/malfet/597/head
2025-12-04T10:32:19.4007655Z  * [new branch]                gh/malfet/597/orig      -> origin/gh/malfet/597/orig
2025-12-04T10:32:19.4007722Z  * [new branch]                gh/malfet/598/base      -> origin/gh/malfet/598/base
2025-12-04T10:32:19.4007790Z  * [new branch]                gh/malfet/598/head      -> origin/gh/malfet/598/head
2025-12-04T10:32:19.4007859Z  * [new branch]                gh/malfet/598/orig      -> origin/gh/malfet/598/orig
2025-12-04T10:32:19.4007924Z  * [new branch]                gh/malfet/599/base      -> origin/gh/malfet/599/base
2025-12-04T10:32:19.4007991Z  * [new branch]                gh/malfet/599/head      -> origin/gh/malfet/599/head
2025-12-04T10:32:19.4008057Z  * [new branch]                gh/malfet/599/orig      -> origin/gh/malfet/599/orig
2025-12-04T10:32:19.4008153Z  * [new branch]                gh/malfet/600/base      -> origin/gh/malfet/600/base
2025-12-04T10:32:19.4008220Z  * [new branch]                gh/malfet/600/head      -> origin/gh/malfet/600/head
2025-12-04T10:32:19.4008289Z  * [new branch]                gh/malfet/600/orig      -> origin/gh/malfet/600/orig
2025-12-04T10:32:19.4008355Z  * [new branch]                gh/malfet/601/base      -> origin/gh/malfet/601/base
2025-12-04T10:32:19.4008425Z  * [new branch]                gh/malfet/601/head      -> origin/gh/malfet/601/head
2025-12-04T10:32:19.4008493Z  * [new branch]                gh/malfet/601/orig      -> origin/gh/malfet/601/orig
2025-12-04T10:32:19.4008560Z  * [new branch]                gh/malfet/602/base      -> origin/gh/malfet/602/base
2025-12-04T10:32:19.4008626Z  * [new branch]                gh/malfet/602/head      -> origin/gh/malfet/602/head
2025-12-04T10:32:19.4008697Z  * [new branch]                gh/malfet/602/orig      -> origin/gh/malfet/602/orig
2025-12-04T10:32:19.4008766Z  * [new branch]                gh/malfet/603/base      -> origin/gh/malfet/603/base
2025-12-04T10:32:19.4008834Z  * [new branch]                gh/malfet/603/head      -> origin/gh/malfet/603/head
2025-12-04T10:32:19.4008903Z  * [new branch]                gh/malfet/603/orig      -> origin/gh/malfet/603/orig
2025-12-04T10:32:19.4008970Z  * [new branch]                gh/malfet/604/base      -> origin/gh/malfet/604/base
2025-12-04T10:32:19.4009064Z  * [new branch]                gh/malfet/604/head      -> origin/gh/malfet/604/head
2025-12-04T10:32:19.4009132Z  * [new branch]                gh/malfet/604/orig      -> origin/gh/malfet/604/orig
2025-12-04T10:32:19.4009197Z  * [new branch]                gh/malfet/605/base      -> origin/gh/malfet/605/base
2025-12-04T10:32:19.4009267Z  * [new branch]                gh/malfet/605/head      -> origin/gh/malfet/605/head
2025-12-04T10:32:19.4009334Z  * [new branch]                gh/malfet/605/orig      -> origin/gh/malfet/605/orig
2025-12-04T10:32:19.4009403Z  * [new branch]                gh/malfet/606/base      -> origin/gh/malfet/606/base
2025-12-04T10:32:19.4009476Z  * [new branch]                gh/malfet/606/head      -> origin/gh/malfet/606/head
2025-12-04T10:32:19.4009544Z  * [new branch]                gh/malfet/606/orig      -> origin/gh/malfet/606/orig
2025-12-04T10:32:19.4009646Z  * [new branch]                gh/malfet/607/base      -> origin/gh/malfet/607/base
2025-12-04T10:32:19.4009715Z  * [new branch]                gh/malfet/607/head      -> origin/gh/malfet/607/head
2025-12-04T10:32:19.4009782Z  * [new branch]                gh/malfet/607/orig      -> origin/gh/malfet/607/orig
2025-12-04T10:32:19.4009848Z  * [new branch]                gh/malfet/608/base      -> origin/gh/malfet/608/base
2025-12-04T10:32:19.4009916Z  * [new branch]                gh/malfet/608/head      -> origin/gh/malfet/608/head
2025-12-04T10:32:19.4009983Z  * [new branch]                gh/malfet/608/orig      -> origin/gh/malfet/608/orig
2025-12-04T10:32:19.4010050Z  * [new branch]                gh/malfet/609/base      -> origin/gh/malfet/609/base
2025-12-04T10:32:19.4010123Z  * [new branch]                gh/malfet/609/head      -> origin/gh/malfet/609/head
2025-12-04T10:32:19.4010190Z  * [new branch]                gh/malfet/609/orig      -> origin/gh/malfet/609/orig
2025-12-04T10:32:19.4010256Z  * [new branch]                gh/malfet/610/base      -> origin/gh/malfet/610/base
2025-12-04T10:32:19.4010326Z  * [new branch]                gh/malfet/610/head      -> origin/gh/malfet/610/head
2025-12-04T10:32:19.4010393Z  * [new branch]                gh/malfet/610/orig      -> origin/gh/malfet/610/orig
2025-12-04T10:32:19.4010459Z  * [new branch]                gh/malfet/611/base      -> origin/gh/malfet/611/base
2025-12-04T10:32:19.4010528Z  * [new branch]                gh/malfet/611/head      -> origin/gh/malfet/611/head
2025-12-04T10:32:19.4010595Z  * [new branch]                gh/malfet/611/orig      -> origin/gh/malfet/611/orig
2025-12-04T10:32:19.4010662Z  * [new branch]                gh/malfet/612/base      -> origin/gh/malfet/612/base
2025-12-04T10:32:19.4010781Z  * [new branch]                gh/malfet/612/head      -> origin/gh/malfet/612/head
2025-12-04T10:32:19.4010848Z  * [new branch]                gh/malfet/612/orig      -> origin/gh/malfet/612/orig
2025-12-04T10:32:19.4010916Z  * [new branch]                gh/malfet/64/base       -> origin/gh/malfet/64/base
2025-12-04T10:32:19.4010985Z  * [new branch]                gh/malfet/64/head       -> origin/gh/malfet/64/head
2025-12-04T10:32:19.4011075Z  * [new branch]                gh/manuelcandales/11/base -> origin/gh/manuelcandales/11/base
2025-12-04T10:32:19.4011164Z  * [new branch]                gh/manuelcandales/11/head -> origin/gh/manuelcandales/11/head
2025-12-04T10:32:19.4011249Z  * [new branch]                gh/manuelcandales/11/orig -> origin/gh/manuelcandales/11/orig
2025-12-04T10:32:19.4011319Z  * [new branch]                gh/markkm/1/base        -> origin/gh/markkm/1/base
2025-12-04T10:32:19.4011395Z  * [new branch]                gh/masnesral/1/base     -> origin/gh/masnesral/1/base
2025-12-04T10:32:19.4011468Z  * [new branch]                gh/masnesral/1/head     -> origin/gh/masnesral/1/head
2025-12-04T10:32:19.4011539Z  * [new branch]                gh/masnesral/1/orig     -> origin/gh/masnesral/1/orig
2025-12-04T10:32:19.4011610Z  * [new branch]                gh/mhorowitz/0/base     -> origin/gh/mhorowitz/0/base
2025-12-04T10:32:19.4011724Z  * [new branch]                gh/mhorowitz/0/head     -> origin/gh/mhorowitz/0/head
2025-12-04T10:32:19.4011794Z  * [new branch]                gh/mhorowitz/1/base     -> origin/gh/mhorowitz/1/base
2025-12-04T10:32:19.4011867Z  * [new branch]                gh/mhorowitz/1/head     -> origin/gh/mhorowitz/1/head
2025-12-04T10:32:19.4011937Z  * [new branch]                gh/mhorowitz/2/base     -> origin/gh/mhorowitz/2/base
2025-12-04T10:32:19.4012006Z  * [new branch]                gh/mhorowitz/2/head     -> origin/gh/mhorowitz/2/head
2025-12-04T10:32:19.4012080Z  * [new branch]                gh/mhorowitz/3/base     -> origin/gh/mhorowitz/3/base
2025-12-04T10:32:19.4012150Z  * [new branch]                gh/mhorowitz/3/head     -> origin/gh/mhorowitz/3/head
2025-12-04T10:32:19.4012219Z  * [new branch]                gh/mhorowitz/4/base     -> origin/gh/mhorowitz/4/base
2025-12-04T10:32:19.4012293Z  * [new branch]                gh/mhorowitz/4/head     -> origin/gh/mhorowitz/4/head
2025-12-04T10:32:19.4012362Z  * [new branch]                gh/mhorowitz/5/base     -> origin/gh/mhorowitz/5/base
2025-12-04T10:32:19.4012433Z  * [new branch]                gh/mhorowitz/5/head     -> origin/gh/mhorowitz/5/head
2025-12-04T10:32:19.4012505Z  * [new branch]                gh/mhorowitz/6/base     -> origin/gh/mhorowitz/6/base
2025-12-04T10:32:19.4012575Z  * [new branch]                gh/mhorowitz/6/head     -> origin/gh/mhorowitz/6/head
2025-12-04T10:32:19.4012677Z  * [new branch]                gh/mikaylagawarecki/234/base -> origin/gh/mikaylagawarecki/234/base
2025-12-04T10:32:19.4012774Z  * [new branch]                gh/mikaylagawarecki/234/head -> origin/gh/mikaylagawarecki/234/head
2025-12-04T10:32:19.4012867Z  * [new branch]                gh/mikaylagawarecki/235/base -> origin/gh/mikaylagawarecki/235/base
2025-12-04T10:32:19.4012962Z  * [new branch]                gh/mikaylagawarecki/235/head -> origin/gh/mikaylagawarecki/235/head
2025-12-04T10:32:19.4013055Z  * [new branch]                gh/mikaylagawarecki/236/base -> origin/gh/mikaylagawarecki/236/base
2025-12-04T10:32:19.4013147Z  * [new branch]                gh/mikaylagawarecki/236/head -> origin/gh/mikaylagawarecki/236/head
2025-12-04T10:32:19.4013240Z  * [new branch]                gh/mikaylagawarecki/237/base -> origin/gh/mikaylagawarecki/237/base
2025-12-04T10:32:19.4013332Z  * [new branch]                gh/mikaylagawarecki/237/head -> origin/gh/mikaylagawarecki/237/head
2025-12-04T10:32:19.4013422Z  * [new branch]                gh/mikaylagawarecki/238/base -> origin/gh/mikaylagawarecki/238/base
2025-12-04T10:32:19.4013545Z  * [new branch]                gh/mikaylagawarecki/238/head -> origin/gh/mikaylagawarecki/238/head
2025-12-04T10:32:19.4013635Z  * [new branch]                gh/mikaylagawarecki/336/base -> origin/gh/mikaylagawarecki/336/base
2025-12-04T10:32:19.4013726Z  * [new branch]                gh/mikaylagawarecki/336/head -> origin/gh/mikaylagawarecki/336/head
2025-12-04T10:32:19.4013821Z  * [new branch]                gh/mikaylagawarecki/336/orig -> origin/gh/mikaylagawarecki/336/orig
2025-12-04T10:32:19.4013912Z  * [new branch]                gh/mikaylagawarecki/341/base -> origin/gh/mikaylagawarecki/341/base
2025-12-04T10:32:19.4014003Z  * [new branch]                gh/mikaylagawarecki/341/head -> origin/gh/mikaylagawarecki/341/head
2025-12-04T10:32:19.4014095Z  * [new branch]                gh/mikaylagawarecki/341/orig -> origin/gh/mikaylagawarecki/341/orig
2025-12-04T10:32:19.4014187Z  * [new branch]                gh/mikaylagawarecki/342/base -> origin/gh/mikaylagawarecki/342/base
2025-12-04T10:32:19.4014280Z  * [new branch]                gh/mikaylagawarecki/342/head -> origin/gh/mikaylagawarecki/342/head
2025-12-04T10:32:19.4014371Z  * [new branch]                gh/mikaylagawarecki/342/orig -> origin/gh/mikaylagawarecki/342/orig
2025-12-04T10:32:19.4014461Z  * [new branch]                gh/mikaylagawarecki/345/base -> origin/gh/mikaylagawarecki/345/base
2025-12-04T10:32:19.4014581Z  * [new branch]                gh/mikaylagawarecki/345/head -> origin/gh/mikaylagawarecki/345/head
2025-12-04T10:32:19.4014672Z  * [new branch]                gh/mikaylagawarecki/345/orig -> origin/gh/mikaylagawarecki/345/orig
2025-12-04T10:32:19.4014762Z  * [new branch]                gh/mikaylagawarecki/346/base -> origin/gh/mikaylagawarecki/346/base
2025-12-04T10:32:19.4014856Z  * [new branch]                gh/mikaylagawarecki/346/head -> origin/gh/mikaylagawarecki/346/head
2025-12-04T10:32:19.4014950Z  * [new branch]                gh/mikaylagawarecki/346/orig -> origin/gh/mikaylagawarecki/346/orig
2025-12-04T10:32:19.4015043Z  * [new branch]                gh/mikaylagawarecki/347/base -> origin/gh/mikaylagawarecki/347/base
2025-12-04T10:32:19.4015137Z  * [new branch]                gh/mikaylagawarecki/347/head -> origin/gh/mikaylagawarecki/347/head
2025-12-04T10:32:19.4015228Z  * [new branch]                gh/mikaylagawarecki/347/orig -> origin/gh/mikaylagawarecki/347/orig
2025-12-04T10:32:19.4015320Z  * [new branch]                gh/mikaylagawarecki/350/base -> origin/gh/mikaylagawarecki/350/base
2025-12-04T10:32:19.4015414Z  * [new branch]                gh/mikaylagawarecki/350/head -> origin/gh/mikaylagawarecki/350/head
2025-12-04T10:32:19.4015505Z  * [new branch]                gh/mikaylagawarecki/350/orig -> origin/gh/mikaylagawarecki/350/orig
2025-12-04T10:32:19.4015597Z  * [new branch]                gh/mikaylagawarecki/351/base -> origin/gh/mikaylagawarecki/351/base
2025-12-04T10:32:19.4015686Z  * [new branch]                gh/mikaylagawarecki/351/head -> origin/gh/mikaylagawarecki/351/head
2025-12-04T10:32:19.4015779Z  * [new branch]                gh/mikaylagawarecki/351/orig -> origin/gh/mikaylagawarecki/351/orig
2025-12-04T10:32:19.4015875Z  * [new branch]                gh/mikaylagawarecki/352/base -> origin/gh/mikaylagawarecki/352/base
2025-12-04T10:32:19.4015966Z  * [new branch]                gh/mikaylagawarecki/352/head -> origin/gh/mikaylagawarecki/352/head
2025-12-04T10:32:19.4016057Z  * [new branch]                gh/mikaylagawarecki/352/orig -> origin/gh/mikaylagawarecki/352/orig
2025-12-04T10:32:19.4016151Z  * [new branch]                gh/mikaylagawarecki/353/base -> origin/gh/mikaylagawarecki/353/base
2025-12-04T10:32:19.4016242Z  * [new branch]                gh/mikaylagawarecki/353/head -> origin/gh/mikaylagawarecki/353/head
2025-12-04T10:32:19.4016334Z  * [new branch]                gh/mikaylagawarecki/353/orig -> origin/gh/mikaylagawarecki/353/orig
2025-12-04T10:32:19.4016427Z  * [new branch]                gh/mikaylagawarecki/354/base -> origin/gh/mikaylagawarecki/354/base
2025-12-04T10:32:19.4016553Z  * [new branch]                gh/mikaylagawarecki/354/head -> origin/gh/mikaylagawarecki/354/head
2025-12-04T10:32:19.4016645Z  * [new branch]                gh/mikaylagawarecki/354/orig -> origin/gh/mikaylagawarecki/354/orig
2025-12-04T10:32:19.4016739Z  * [new branch]                gh/mikaylagawarecki/356/base -> origin/gh/mikaylagawarecki/356/base
2025-12-04T10:32:19.4016829Z  * [new branch]                gh/mikaylagawarecki/356/head -> origin/gh/mikaylagawarecki/356/head
2025-12-04T10:32:19.4016922Z  * [new branch]                gh/mikaylagawarecki/356/orig -> origin/gh/mikaylagawarecki/356/orig
2025-12-04T10:32:19.4017013Z  * [new branch]                gh/mikaylagawarecki/357/base -> origin/gh/mikaylagawarecki/357/base
2025-12-04T10:32:19.4017103Z  * [new branch]                gh/mikaylagawarecki/357/head -> origin/gh/mikaylagawarecki/357/head
2025-12-04T10:32:19.4017197Z  * [new branch]                gh/mikaylagawarecki/357/orig -> origin/gh/mikaylagawarecki/357/orig
2025-12-04T10:32:19.4017290Z  * [new branch]                gh/mikaylagawarecki/359/base -> origin/gh/mikaylagawarecki/359/base
2025-12-04T10:32:19.4017381Z  * [new branch]                gh/mikaylagawarecki/359/head -> origin/gh/mikaylagawarecki/359/head
2025-12-04T10:32:19.4017475Z  * [new branch]                gh/mikaylagawarecki/359/orig -> origin/gh/mikaylagawarecki/359/orig
2025-12-04T10:32:19.4017589Z  * [new branch]                gh/mikaylagawarecki/360/base -> origin/gh/mikaylagawarecki/360/base
2025-12-04T10:32:19.4017681Z  * [new branch]                gh/mikaylagawarecki/360/head -> origin/gh/mikaylagawarecki/360/head
2025-12-04T10:32:19.4017775Z  * [new branch]                gh/mikaylagawarecki/360/orig -> origin/gh/mikaylagawarecki/360/orig
2025-12-04T10:32:19.4017868Z  * [new branch]                gh/mikaylagawarecki/361/base -> origin/gh/mikaylagawarecki/361/base
2025-12-04T10:32:19.4017960Z  * [new branch]                gh/mikaylagawarecki/361/head -> origin/gh/mikaylagawarecki/361/head
2025-12-04T10:32:19.4018054Z  * [new branch]                gh/mikaylagawarecki/361/orig -> origin/gh/mikaylagawarecki/361/orig
2025-12-04T10:32:19.4018146Z  * [new branch]                gh/mikaylagawarecki/362/base -> origin/gh/mikaylagawarecki/362/base
2025-12-04T10:32:19.4018238Z  * [new branch]                gh/mikaylagawarecki/362/head -> origin/gh/mikaylagawarecki/362/head
2025-12-04T10:32:19.4018332Z  * [new branch]                gh/mikaylagawarecki/362/orig -> origin/gh/mikaylagawarecki/362/orig
2025-12-04T10:32:19.4018423Z  * [new branch]                gh/mikaylagawarecki/363/base -> origin/gh/mikaylagawarecki/363/base
2025-12-04T10:32:19.4018515Z  * [new branch]                gh/mikaylagawarecki/363/head -> origin/gh/mikaylagawarecki/363/head
2025-12-04T10:32:19.4018606Z  * [new branch]                gh/mikaylagawarecki/363/orig -> origin/gh/mikaylagawarecki/363/orig
2025-12-04T10:32:19.4018698Z  * [new branch]                gh/mikaylagawarecki/364/base -> origin/gh/mikaylagawarecki/364/base
2025-12-04T10:32:19.4018793Z  * [new branch]                gh/mikaylagawarecki/364/head -> origin/gh/mikaylagawarecki/364/head
2025-12-04T10:32:19.4018885Z  * [new branch]                gh/mikaylagawarecki/364/orig -> origin/gh/mikaylagawarecki/364/orig
2025-12-04T10:32:19.4018977Z  * [new branch]                gh/mikaylagawarecki/365/base -> origin/gh/mikaylagawarecki/365/base
2025-12-04T10:32:19.4019073Z  * [new branch]                gh/mikaylagawarecki/365/head -> origin/gh/mikaylagawarecki/365/head
2025-12-04T10:32:19.4019165Z  * [new branch]                gh/mikaylagawarecki/365/orig -> origin/gh/mikaylagawarecki/365/orig
2025-12-04T10:32:19.4019256Z  * [new branch]                gh/mikaylagawarecki/366/base -> origin/gh/mikaylagawarecki/366/base
2025-12-04T10:32:19.4019348Z  * [new branch]                gh/mikaylagawarecki/366/head -> origin/gh/mikaylagawarecki/366/head
2025-12-04T10:32:19.4019439Z  * [new branch]                gh/mikaylagawarecki/366/orig -> origin/gh/mikaylagawarecki/366/orig
2025-12-04T10:32:19.4019558Z  * [new branch]                gh/mikaylagawarecki/367/base -> origin/gh/mikaylagawarecki/367/base
2025-12-04T10:32:19.4019691Z  * [new branch]                gh/mikaylagawarecki/367/head -> origin/gh/mikaylagawarecki/367/head
2025-12-04T10:32:19.4019783Z  * [new branch]                gh/mikaylagawarecki/367/orig -> origin/gh/mikaylagawarecki/367/orig
2025-12-04T10:32:19.4019878Z  * [new branch]                gh/mikaylagawarecki/368/base -> origin/gh/mikaylagawarecki/368/base
2025-12-04T10:32:19.4019968Z  * [new branch]                gh/mikaylagawarecki/368/head -> origin/gh/mikaylagawarecki/368/head
2025-12-04T10:32:19.4020060Z  * [new branch]                gh/mikaylagawarecki/368/orig -> origin/gh/mikaylagawarecki/368/orig
2025-12-04T10:32:19.4020154Z  * [new branch]                gh/mikaylagawarecki/369/base -> origin/gh/mikaylagawarecki/369/base
2025-12-04T10:32:19.4020246Z  * [new branch]                gh/mikaylagawarecki/369/head -> origin/gh/mikaylagawarecki/369/head
2025-12-04T10:32:19.4020339Z  * [new branch]                gh/mikaylagawarecki/369/orig -> origin/gh/mikaylagawarecki/369/orig
2025-12-04T10:32:19.4020430Z  * [new branch]                gh/mikaylagawarecki/370/base -> origin/gh/mikaylagawarecki/370/base
2025-12-04T10:32:19.4020522Z  * [new branch]                gh/mikaylagawarecki/370/head -> origin/gh/mikaylagawarecki/370/head
2025-12-04T10:32:19.4020664Z  * [new branch]                gh/mikaylagawarecki/370/orig -> origin/gh/mikaylagawarecki/370/orig
2025-12-04T10:32:19.4020758Z  * [new branch]                gh/mikaylagawarecki/371/base -> origin/gh/mikaylagawarecki/371/base
2025-12-04T10:32:19.4020849Z  * [new branch]                gh/mikaylagawarecki/371/head -> origin/gh/mikaylagawarecki/371/head
2025-12-04T10:32:19.4020941Z  * [new branch]                gh/mikaylagawarecki/371/orig -> origin/gh/mikaylagawarecki/371/orig
2025-12-04T10:32:19.4021033Z  * [new branch]                gh/mikaylagawarecki/372/base -> origin/gh/mikaylagawarecki/372/base
2025-12-04T10:32:19.4021124Z  * [new branch]                gh/mikaylagawarecki/372/head -> origin/gh/mikaylagawarecki/372/head
2025-12-04T10:32:19.4021215Z  * [new branch]                gh/mikaylagawarecki/372/orig -> origin/gh/mikaylagawarecki/372/orig
2025-12-04T10:32:19.4021309Z  * [new branch]                gh/mikaylagawarecki/373/base -> origin/gh/mikaylagawarecki/373/base
2025-12-04T10:32:19.4021400Z  * [new branch]                gh/mikaylagawarecki/373/head -> origin/gh/mikaylagawarecki/373/head
2025-12-04T10:32:19.4021491Z  * [new branch]                gh/mikaylagawarecki/373/orig -> origin/gh/mikaylagawarecki/373/orig
2025-12-04T10:32:19.4021583Z  * [new branch]                gh/mikaylagawarecki/374/base -> origin/gh/mikaylagawarecki/374/base
2025-12-04T10:32:19.4021674Z  * [new branch]                gh/mikaylagawarecki/374/head -> origin/gh/mikaylagawarecki/374/head
2025-12-04T10:32:19.4021765Z  * [new branch]                gh/mikaylagawarecki/374/orig -> origin/gh/mikaylagawarecki/374/orig
2025-12-04T10:32:19.4021857Z  * [new branch]                gh/mikaylagawarecki/375/base -> origin/gh/mikaylagawarecki/375/base
2025-12-04T10:32:19.4021948Z  * [new branch]                gh/mikaylagawarecki/375/head -> origin/gh/mikaylagawarecki/375/head
2025-12-04T10:32:19.4022042Z  * [new branch]                gh/mikaylagawarecki/375/orig -> origin/gh/mikaylagawarecki/375/orig
2025-12-04T10:32:19.4022132Z  * [new branch]                gh/mikaylagawarecki/376/base -> origin/gh/mikaylagawarecki/376/base
2025-12-04T10:32:19.4022223Z  * [new branch]                gh/mikaylagawarecki/376/head -> origin/gh/mikaylagawarecki/376/head
2025-12-04T10:32:19.4022316Z  * [new branch]                gh/mikaylagawarecki/376/orig -> origin/gh/mikaylagawarecki/376/orig
2025-12-04T10:32:19.4022407Z  * [new branch]                gh/mikaylagawarecki/377/base -> origin/gh/mikaylagawarecki/377/base
2025-12-04T10:32:19.4022496Z  * [new branch]                gh/mikaylagawarecki/377/head -> origin/gh/mikaylagawarecki/377/head
2025-12-04T10:32:19.4022636Z  * [new branch]                gh/mikaylagawarecki/377/orig -> origin/gh/mikaylagawarecki/377/orig
2025-12-04T10:32:19.4022727Z  * [new branch]                gh/mikaylagawarecki/378/base -> origin/gh/mikaylagawarecki/378/base
2025-12-04T10:32:19.4022822Z  * [new branch]                gh/mikaylagawarecki/378/head -> origin/gh/mikaylagawarecki/378/head
2025-12-04T10:32:19.4022915Z  * [new branch]                gh/mikaylagawarecki/378/orig -> origin/gh/mikaylagawarecki/378/orig
2025-12-04T10:32:19.4023007Z  * [new branch]                gh/mikaylagawarecki/379/base -> origin/gh/mikaylagawarecki/379/base
2025-12-04T10:32:19.4023100Z  * [new branch]                gh/mikaylagawarecki/379/head -> origin/gh/mikaylagawarecki/379/head
2025-12-04T10:32:19.4023191Z  * [new branch]                gh/mikaylagawarecki/379/orig -> origin/gh/mikaylagawarecki/379/orig
2025-12-04T10:32:19.4023282Z  * [new branch]                gh/mikaylagawarecki/380/base -> origin/gh/mikaylagawarecki/380/base
2025-12-04T10:32:19.4023377Z  * [new branch]                gh/mikaylagawarecki/380/head -> origin/gh/mikaylagawarecki/380/head
2025-12-04T10:32:19.4023468Z  * [new branch]                gh/mikaylagawarecki/380/orig -> origin/gh/mikaylagawarecki/380/orig
2025-12-04T10:32:19.4023558Z  * [new branch]                gh/mikaylagawarecki/381/base -> origin/gh/mikaylagawarecki/381/base
2025-12-04T10:32:19.4023684Z  * [new branch]                gh/mikaylagawarecki/381/head -> origin/gh/mikaylagawarecki/381/head
2025-12-04T10:32:19.4023777Z  * [new branch]                gh/mikaylagawarecki/381/orig -> origin/gh/mikaylagawarecki/381/orig
2025-12-04T10:32:19.4023868Z  * [new branch]                gh/mikaylagawarecki/382/base -> origin/gh/mikaylagawarecki/382/base
2025-12-04T10:32:19.4023962Z  * [new branch]                gh/mikaylagawarecki/382/head -> origin/gh/mikaylagawarecki/382/head
2025-12-04T10:32:19.4024054Z  * [new branch]                gh/mikaylagawarecki/382/orig -> origin/gh/mikaylagawarecki/382/orig
2025-12-04T10:32:19.4024148Z  * [new branch]                gh/mikaylagawarecki/383/base -> origin/gh/mikaylagawarecki/383/base
2025-12-04T10:32:19.4024239Z  * [new branch]                gh/mikaylagawarecki/383/head -> origin/gh/mikaylagawarecki/383/head
2025-12-04T10:32:19.4024330Z  * [new branch]                gh/mikaylagawarecki/383/orig -> origin/gh/mikaylagawarecki/383/orig
2025-12-04T10:32:19.4024424Z  * [new branch]                gh/mikaylagawarecki/384/base -> origin/gh/mikaylagawarecki/384/base
2025-12-04T10:32:19.4024516Z  * [new branch]                gh/mikaylagawarecki/384/head -> origin/gh/mikaylagawarecki/384/head
2025-12-04T10:32:19.4024608Z  * [new branch]                gh/mikaylagawarecki/384/orig -> origin/gh/mikaylagawarecki/384/orig
2025-12-04T10:32:19.4024704Z  * [new branch]                gh/mikaylagawarecki/385/base -> origin/gh/mikaylagawarecki/385/base
2025-12-04T10:32:19.4024795Z  * [new branch]                gh/mikaylagawarecki/385/head -> origin/gh/mikaylagawarecki/385/head
2025-12-04T10:32:19.4024886Z  * [new branch]                gh/mikaylagawarecki/385/orig -> origin/gh/mikaylagawarecki/385/orig
2025-12-04T10:32:19.4024979Z  * [new branch]                gh/mikaylagawarecki/386/base -> origin/gh/mikaylagawarecki/386/base
2025-12-04T10:32:19.4025070Z  * [new branch]                gh/mikaylagawarecki/386/head -> origin/gh/mikaylagawarecki/386/head
2025-12-04T10:32:19.4025162Z  * [new branch]                gh/mikaylagawarecki/386/orig -> origin/gh/mikaylagawarecki/386/orig
2025-12-04T10:32:19.4025255Z  * [new branch]                gh/mikaylagawarecki/387/base -> origin/gh/mikaylagawarecki/387/base
2025-12-04T10:32:19.4025351Z  * [new branch]                gh/mikaylagawarecki/387/head -> origin/gh/mikaylagawarecki/387/head
2025-12-04T10:32:19.4025442Z  * [new branch]                gh/mikaylagawarecki/387/orig -> origin/gh/mikaylagawarecki/387/orig
2025-12-04T10:32:19.4025537Z  * [new branch]                gh/mikaylagawarecki/388/base -> origin/gh/mikaylagawarecki/388/base
2025-12-04T10:32:19.4025661Z  * [new branch]                gh/mikaylagawarecki/388/head -> origin/gh/mikaylagawarecki/388/head
2025-12-04T10:32:19.4025753Z  * [new branch]                gh/mikaylagawarecki/388/orig -> origin/gh/mikaylagawarecki/388/orig
2025-12-04T10:32:19.4025844Z  * [new branch]                gh/mikaylagawarecki/389/base -> origin/gh/mikaylagawarecki/389/base
2025-12-04T10:32:19.4025937Z  * [new branch]                gh/mikaylagawarecki/389/head -> origin/gh/mikaylagawarecki/389/head
2025-12-04T10:32:19.4026028Z  * [new branch]                gh/mikaylagawarecki/389/orig -> origin/gh/mikaylagawarecki/389/orig
2025-12-04T10:32:19.4026119Z  * [new branch]                gh/mikaylagawarecki/390/base -> origin/gh/mikaylagawarecki/390/base
2025-12-04T10:32:19.4026209Z  * [new branch]                gh/mikaylagawarecki/390/head -> origin/gh/mikaylagawarecki/390/head
2025-12-04T10:32:19.4026303Z  * [new branch]                gh/mikaylagawarecki/390/orig -> origin/gh/mikaylagawarecki/390/orig
2025-12-04T10:32:19.4026397Z  * [new branch]                gh/mikaylagawarecki/391/base -> origin/gh/mikaylagawarecki/391/base
2025-12-04T10:32:19.4026488Z  * [new branch]                gh/mikaylagawarecki/391/head -> origin/gh/mikaylagawarecki/391/head
2025-12-04T10:32:19.4026582Z  * [new branch]                gh/mikaylagawarecki/391/orig -> origin/gh/mikaylagawarecki/391/orig
2025-12-04T10:32:19.4026718Z  * [new branch]                gh/mikaylagawarecki/392/base -> origin/gh/mikaylagawarecki/392/base
2025-12-04T10:32:19.4026810Z  * [new branch]                gh/mikaylagawarecki/392/head -> origin/gh/mikaylagawarecki/392/head
2025-12-04T10:32:19.4026904Z  * [new branch]                gh/mikaylagawarecki/392/orig -> origin/gh/mikaylagawarecki/392/orig
2025-12-04T10:32:19.4026972Z  * [new branch]                gh/mlazos/41/base       -> origin/gh/mlazos/41/base
2025-12-04T10:32:19.4027041Z  * [new branch]                gh/mlazos/41/head       -> origin/gh/mlazos/41/head
2025-12-04T10:32:19.4027107Z  * [new branch]                gh/mlazos/41/orig       -> origin/gh/mlazos/41/orig
2025-12-04T10:32:19.4027176Z  * [new branch]                gh/mlazos/42/base       -> origin/gh/mlazos/42/base
2025-12-04T10:32:19.4027244Z  * [new branch]                gh/mlazos/42/head       -> origin/gh/mlazos/42/head
2025-12-04T10:32:19.4027311Z  * [new branch]                gh/mlazos/42/orig       -> origin/gh/mlazos/42/orig
2025-12-04T10:32:19.4027378Z  * [new branch]                gh/mlazos/43/base       -> origin/gh/mlazos/43/base
2025-12-04T10:32:19.4027446Z  * [new branch]                gh/mlazos/43/head       -> origin/gh/mlazos/43/head
2025-12-04T10:32:19.4027512Z  * [new branch]                gh/mlazos/43/orig       -> origin/gh/mlazos/43/orig
2025-12-04T10:32:19.4027578Z  * [new branch]                gh/mlazos/44/base       -> origin/gh/mlazos/44/base
2025-12-04T10:32:19.4027646Z  * [new branch]                gh/mlazos/44/head       -> origin/gh/mlazos/44/head
2025-12-04T10:32:19.4027712Z  * [new branch]                gh/mlazos/44/orig       -> origin/gh/mlazos/44/orig
2025-12-04T10:32:19.4027780Z  * [new branch]                gh/mlazos/47/base       -> origin/gh/mlazos/47/base
2025-12-04T10:32:19.4027849Z  * [new branch]                gh/mlazos/47/head       -> origin/gh/mlazos/47/head
2025-12-04T10:32:19.4027913Z  * [new branch]                gh/mlazos/47/orig       -> origin/gh/mlazos/47/orig
2025-12-04T10:32:19.4027981Z  * [new branch]                gh/mlazos/48/base       -> origin/gh/mlazos/48/base
2025-12-04T10:32:19.4028048Z  * [new branch]                gh/mlazos/48/head       -> origin/gh/mlazos/48/head
2025-12-04T10:32:19.4028114Z  * [new branch]                gh/mlazos/48/orig       -> origin/gh/mlazos/48/orig
2025-12-04T10:32:19.4028180Z  * [new branch]                gh/mlazos/49/base       -> origin/gh/mlazos/49/base
2025-12-04T10:32:19.4028247Z  * [new branch]                gh/mlazos/49/head       -> origin/gh/mlazos/49/head
2025-12-04T10:32:19.4028311Z  * [new branch]                gh/mlazos/49/orig       -> origin/gh/mlazos/49/orig
2025-12-04T10:32:19.4028404Z  * [new branch]                gh/mlazos/50/base       -> origin/gh/mlazos/50/base
2025-12-04T10:32:19.4028472Z  * [new branch]                gh/mlazos/50/head       -> origin/gh/mlazos/50/head
2025-12-04T10:32:19.4028538Z  * [new branch]                gh/mlazos/50/orig       -> origin/gh/mlazos/50/orig
2025-12-04T10:32:19.4028605Z  * [new branch]                gh/mlazos/51/base       -> origin/gh/mlazos/51/base
2025-12-04T10:32:19.4028673Z  * [new branch]                gh/mlazos/51/head       -> origin/gh/mlazos/51/head
2025-12-04T10:32:19.4028739Z  * [new branch]                gh/mlazos/51/orig       -> origin/gh/mlazos/51/orig
2025-12-04T10:32:19.4028806Z  * [new branch]                gh/mlazos/52/base       -> origin/gh/mlazos/52/base
2025-12-04T10:32:19.4028872Z  * [new branch]                gh/mlazos/52/head       -> origin/gh/mlazos/52/head
2025-12-04T10:32:19.4028938Z  * [new branch]                gh/mlazos/52/orig       -> origin/gh/mlazos/52/orig
2025-12-04T10:32:19.4029009Z  * [new branch]                gh/mlazos/53/base       -> origin/gh/mlazos/53/base
2025-12-04T10:32:19.4029074Z  * [new branch]                gh/mlazos/53/head       -> origin/gh/mlazos/53/head
2025-12-04T10:32:19.4029140Z  * [new branch]                gh/mlazos/53/orig       -> origin/gh/mlazos/53/orig
2025-12-04T10:32:19.4029209Z  * [new branch]                gh/mlazos/54/base       -> origin/gh/mlazos/54/base
2025-12-04T10:32:19.4029303Z  * [new branch]                gh/mlazos/54/head       -> origin/gh/mlazos/54/head
2025-12-04T10:32:19.4029367Z  * [new branch]                gh/mlazos/54/orig       -> origin/gh/mlazos/54/orig
2025-12-04T10:32:19.4029435Z  * [new branch]                gh/mlazos/55/base       -> origin/gh/mlazos/55/base
2025-12-04T10:32:19.4029521Z  * [new branch]                gh/mlazos/55/head       -> origin/gh/mlazos/55/head
2025-12-04T10:32:19.4029619Z  * [new branch]                gh/mlazos/55/orig       -> origin/gh/mlazos/55/orig
2025-12-04T10:32:19.4029691Z  * [new branch]                gh/mlazos/56/base       -> origin/gh/mlazos/56/base
2025-12-04T10:32:19.4029757Z  * [new branch]                gh/mlazos/56/head       -> origin/gh/mlazos/56/head
2025-12-04T10:32:19.4029821Z  * [new branch]                gh/mlazos/56/orig       -> origin/gh/mlazos/56/orig
2025-12-04T10:32:19.4029889Z  * [new branch]                gh/mlazos/57/base       -> origin/gh/mlazos/57/base
2025-12-04T10:32:19.4029956Z  * [new branch]                gh/mlazos/57/head       -> origin/gh/mlazos/57/head
2025-12-04T10:32:19.4030022Z  * [new branch]                gh/mlazos/57/orig       -> origin/gh/mlazos/57/orig
2025-12-04T10:32:19.4030092Z  * [new branch]                gh/mlazos/58/base       -> origin/gh/mlazos/58/base
2025-12-04T10:32:19.4030157Z  * [new branch]                gh/mlazos/58/head       -> origin/gh/mlazos/58/head
2025-12-04T10:32:19.4030223Z  * [new branch]                gh/mlazos/58/orig       -> origin/gh/mlazos/58/orig
2025-12-04T10:32:19.4030292Z  * [new branch]                gh/mlazos/59/base       -> origin/gh/mlazos/59/base
2025-12-04T10:32:19.4030358Z  * [new branch]                gh/mlazos/59/head       -> origin/gh/mlazos/59/head
2025-12-04T10:32:19.4030426Z  * [new branch]                gh/mlazos/59/orig       -> origin/gh/mlazos/59/orig
2025-12-04T10:32:19.4030492Z  * [new branch]                gh/mlazos/60/base       -> origin/gh/mlazos/60/base
2025-12-04T10:32:19.4030559Z  * [new branch]                gh/mlazos/60/head       -> origin/gh/mlazos/60/head
2025-12-04T10:32:19.4030625Z  * [new branch]                gh/mlazos/60/orig       -> origin/gh/mlazos/60/orig
2025-12-04T10:32:19.4030691Z  * [new branch]                gh/mlazos/61/base       -> origin/gh/mlazos/61/base
2025-12-04T10:32:19.4030756Z  * [new branch]                gh/mlazos/61/head       -> origin/gh/mlazos/61/head
2025-12-04T10:32:19.4030824Z  * [new branch]                gh/mlazos/61/orig       -> origin/gh/mlazos/61/orig
2025-12-04T10:32:19.4030936Z  * [new branch]                gh/mlazos/62/base       -> origin/gh/mlazos/62/base
2025-12-04T10:32:19.4031003Z  * [new branch]                gh/mlazos/62/head       -> origin/gh/mlazos/62/head
2025-12-04T10:32:19.4031071Z  * [new branch]                gh/mlazos/62/orig       -> origin/gh/mlazos/62/orig
2025-12-04T10:32:19.4031136Z  * [new branch]                gh/mlazos/63/base       -> origin/gh/mlazos/63/base
2025-12-04T10:32:19.4031204Z  * [new branch]                gh/mlazos/63/head       -> origin/gh/mlazos/63/head
2025-12-04T10:32:19.4031272Z  * [new branch]                gh/mlazos/63/orig       -> origin/gh/mlazos/63/orig
2025-12-04T10:32:19.4031338Z  * [new branch]                gh/mlazos/64/base       -> origin/gh/mlazos/64/base
2025-12-04T10:32:19.4031404Z  * [new branch]                gh/mlazos/64/head       -> origin/gh/mlazos/64/head
2025-12-04T10:32:19.4031473Z  * [new branch]                gh/mlazos/64/orig       -> origin/gh/mlazos/64/orig
2025-12-04T10:32:19.4031538Z  * [new branch]                gh/mlazos/65/base       -> origin/gh/mlazos/65/base
2025-12-04T10:32:19.4031606Z  * [new branch]                gh/mlazos/65/head       -> origin/gh/mlazos/65/head
2025-12-04T10:32:19.4031673Z  * [new branch]                gh/mlazos/65/orig       -> origin/gh/mlazos/65/orig
2025-12-04T10:32:19.4031739Z  * [new branch]                gh/mlazos/66/base       -> origin/gh/mlazos/66/base
2025-12-04T10:32:19.4031852Z  * [new branch]                gh/mlazos/66/head       -> origin/gh/mlazos/66/head
2025-12-04T10:32:19.4031921Z  * [new branch]                gh/mlazos/66/orig       -> origin/gh/mlazos/66/orig
2025-12-04T10:32:19.4031987Z  * [new branch]                gh/mlazos/67/base       -> origin/gh/mlazos/67/base
2025-12-04T10:32:19.4032055Z  * [new branch]                gh/mlazos/67/head       -> origin/gh/mlazos/67/head
2025-12-04T10:32:19.4032120Z  * [new branch]                gh/mlazos/67/orig       -> origin/gh/mlazos/67/orig
2025-12-04T10:32:19.4032186Z  * [new branch]                gh/mlazos/68/base       -> origin/gh/mlazos/68/base
2025-12-04T10:32:19.4032254Z  * [new branch]                gh/mlazos/68/head       -> origin/gh/mlazos/68/head
2025-12-04T10:32:19.4032321Z  * [new branch]                gh/mlazos/68/orig       -> origin/gh/mlazos/68/orig
2025-12-04T10:32:19.4032388Z  * [new branch]                gh/mlazos/69/base       -> origin/gh/mlazos/69/base
2025-12-04T10:32:19.4032457Z  * [new branch]                gh/mlazos/69/head       -> origin/gh/mlazos/69/head
2025-12-04T10:32:19.4032522Z  * [new branch]                gh/mlazos/69/orig       -> origin/gh/mlazos/69/orig
2025-12-04T10:32:19.4032588Z  * [new branch]                gh/mlazos/70/base       -> origin/gh/mlazos/70/base
2025-12-04T10:32:19.4032655Z  * [new branch]                gh/mlazos/70/head       -> origin/gh/mlazos/70/head
2025-12-04T10:32:19.4032723Z  * [new branch]                gh/mlazos/70/orig       -> origin/gh/mlazos/70/orig
2025-12-04T10:32:19.4032789Z  * [new branch]                gh/mlazos/71/base       -> origin/gh/mlazos/71/base
2025-12-04T10:32:19.4032856Z  * [new branch]                gh/mlazos/71/head       -> origin/gh/mlazos/71/head
2025-12-04T10:32:19.4032922Z  * [new branch]                gh/mlazos/71/orig       -> origin/gh/mlazos/71/orig
2025-12-04T10:32:19.4032988Z  * [new branch]                gh/mlazos/72/base       -> origin/gh/mlazos/72/base
2025-12-04T10:32:19.4033058Z  * [new branch]                gh/mlazos/72/head       -> origin/gh/mlazos/72/head
2025-12-04T10:32:19.4033124Z  * [new branch]                gh/mlazos/72/orig       -> origin/gh/mlazos/72/orig
2025-12-04T10:32:19.4033190Z  * [new branch]                gh/mlazos/73/base       -> origin/gh/mlazos/73/base
2025-12-04T10:32:19.4033258Z  * [new branch]                gh/mlazos/73/head       -> origin/gh/mlazos/73/head
2025-12-04T10:32:19.4033324Z  * [new branch]                gh/mlazos/73/orig       -> origin/gh/mlazos/73/orig
2025-12-04T10:32:19.4033390Z  * [new branch]                gh/mrmiywj/1/base       -> origin/gh/mrmiywj/1/base
2025-12-04T10:32:19.4033494Z  * [new branch]                gh/mrmiywj/1/head       -> origin/gh/mrmiywj/1/head
2025-12-04T10:32:19.4033568Z  * [new branch]                gh/muchulee8/73/base    -> origin/gh/muchulee8/73/base
2025-12-04T10:32:19.4033644Z  * [new branch]                gh/muchulee8/73/head    -> origin/gh/muchulee8/73/head
2025-12-04T10:32:19.4033717Z  * [new branch]                gh/muchulee8/73/orig    -> origin/gh/muchulee8/73/orig
2025-12-04T10:32:19.4033802Z  * [new branch]                gh/naveenthangudu/1/base -> origin/gh/naveenthangudu/1/base
2025-12-04T10:32:19.4033886Z  * [new branch]                gh/naveenthangudu/1/head -> origin/gh/naveenthangudu/1/head
2025-12-04T10:32:19.4033967Z  * [new branch]                gh/naveenthangudu/1/orig -> origin/gh/naveenthangudu/1/orig
2025-12-04T10:32:19.4034048Z  * [new branch]                gh/naveenthangudu/2/base -> origin/gh/naveenthangudu/2/base
2025-12-04T10:32:19.4034130Z  * [new branch]                gh/naveenthangudu/2/head -> origin/gh/naveenthangudu/2/head
2025-12-04T10:32:19.4034210Z  * [new branch]                gh/naveenthangudu/2/orig -> origin/gh/naveenthangudu/2/orig
2025-12-04T10:32:19.4034290Z  * [new branch]                gh/naveenthangudu/3/base -> origin/gh/naveenthangudu/3/base
2025-12-04T10:32:19.4034372Z  * [new branch]                gh/naveenthangudu/3/head -> origin/gh/naveenthangudu/3/head
2025-12-04T10:32:19.4034481Z  * [new branch]                gh/naveenthangudu/3/orig -> origin/gh/naveenthangudu/3/orig
2025-12-04T10:32:19.4034560Z  * [new branch]                gh/naveenthangudu/4/base -> origin/gh/naveenthangudu/4/base
2025-12-04T10:32:19.4034640Z  * [new branch]                gh/naveenthangudu/4/head -> origin/gh/naveenthangudu/4/head
2025-12-04T10:32:19.4034720Z  * [new branch]                gh/naveenthangudu/4/orig -> origin/gh/naveenthangudu/4/orig
2025-12-04T10:32:19.4034799Z  * [new branch]                gh/naveenthangudu/5/base -> origin/gh/naveenthangudu/5/base
2025-12-04T10:32:19.4034885Z  * [new branch]                gh/naveenthangudu/5/head -> origin/gh/naveenthangudu/5/head
2025-12-04T10:32:19.4034963Z  * [new branch]                gh/naveenthangudu/5/orig -> origin/gh/naveenthangudu/5/orig
2025-12-04T10:32:19.4035043Z  * [new branch]                gh/naveenthangudu/6/base -> origin/gh/naveenthangudu/6/base
2025-12-04T10:32:19.4035123Z  * [new branch]                gh/naveenthangudu/6/head -> origin/gh/naveenthangudu/6/head
2025-12-04T10:32:19.4035202Z  * [new branch]                gh/naveenthangudu/6/orig -> origin/gh/naveenthangudu/6/orig
2025-12-04T10:32:19.4035282Z  * [new branch]                gh/naveenthangudu/7/base -> origin/gh/naveenthangudu/7/base
2025-12-04T10:32:19.4035361Z  * [new branch]                gh/naveenthangudu/7/head -> origin/gh/naveenthangudu/7/head
2025-12-04T10:32:19.4035440Z  * [new branch]                gh/naveenthangudu/7/orig -> origin/gh/naveenthangudu/7/orig
2025-12-04T10:32:19.4035520Z  * [new branch]                gh/naveenthangudu/8/base -> origin/gh/naveenthangudu/8/base
2025-12-04T10:32:19.4035600Z  * [new branch]                gh/naveenthangudu/8/head -> origin/gh/naveenthangudu/8/head
2025-12-04T10:32:19.4035679Z  * [new branch]                gh/naveenthangudu/8/orig -> origin/gh/naveenthangudu/8/orig
2025-12-04T10:32:19.4035760Z  * [new branch]                gh/naveenthangudu/9/base -> origin/gh/naveenthangudu/9/base
2025-12-04T10:32:19.4035841Z  * [new branch]                gh/naveenthangudu/9/head -> origin/gh/naveenthangudu/9/head
2025-12-04T10:32:19.4035921Z  * [new branch]                gh/naveenthangudu/9/orig -> origin/gh/naveenthangudu/9/orig
2025-12-04T10:32:19.4035996Z  * [new branch]                gh/nikitaved/1/base     -> origin/gh/nikitaved/1/base
2025-12-04T10:32:19.4036069Z  * [new branch]                gh/nikitaved/1/head     -> origin/gh/nikitaved/1/head
2025-12-04T10:32:19.4036141Z  * [new branch]                gh/nikitaved/1/orig     -> origin/gh/nikitaved/1/orig
2025-12-04T10:32:19.4036215Z  * [new branch]                gh/nikitaved/10/base    -> origin/gh/nikitaved/10/base
2025-12-04T10:32:19.4036318Z  * [new branch]                gh/nikitaved/10/head    -> origin/gh/nikitaved/10/head
2025-12-04T10:32:19.4036390Z  * [new branch]                gh/nikitaved/10/orig    -> origin/gh/nikitaved/10/orig
2025-12-04T10:32:19.4036462Z  * [new branch]                gh/nikitaved/11/base    -> origin/gh/nikitaved/11/base
2025-12-04T10:32:19.4036534Z  * [new branch]                gh/nikitaved/11/head    -> origin/gh/nikitaved/11/head
2025-12-04T10:32:19.4036606Z  * [new branch]                gh/nikitaved/11/orig    -> origin/gh/nikitaved/11/orig
2025-12-04T10:32:19.4036680Z  * [new branch]                gh/nikitaved/12/base    -> origin/gh/nikitaved/12/base
2025-12-04T10:32:19.4036751Z  * [new branch]                gh/nikitaved/12/head    -> origin/gh/nikitaved/12/head
2025-12-04T10:32:19.4036823Z  * [new branch]                gh/nikitaved/12/orig    -> origin/gh/nikitaved/12/orig
2025-12-04T10:32:19.4036896Z  * [new branch]                gh/nikitaved/13/base    -> origin/gh/nikitaved/13/base
2025-12-04T10:32:19.4036966Z  * [new branch]                gh/nikitaved/13/head    -> origin/gh/nikitaved/13/head
2025-12-04T10:32:19.4037038Z  * [new branch]                gh/nikitaved/13/orig    -> origin/gh/nikitaved/13/orig
2025-12-04T10:32:19.4037109Z  * [new branch]                gh/nikitaved/14/base    -> origin/gh/nikitaved/14/base
2025-12-04T10:32:19.4037222Z  * [new branch]                gh/nikitaved/14/head    -> origin/gh/nikitaved/14/head
2025-12-04T10:32:19.4037295Z  * [new branch]                gh/nikitaved/14/orig    -> origin/gh/nikitaved/14/orig
2025-12-04T10:32:19.4037366Z  * [new branch]                gh/nikitaved/15/base    -> origin/gh/nikitaved/15/base
2025-12-04T10:32:19.4037437Z  * [new branch]                gh/nikitaved/15/head    -> origin/gh/nikitaved/15/head
2025-12-04T10:32:19.4037509Z  * [new branch]                gh/nikitaved/15/orig    -> origin/gh/nikitaved/15/orig
2025-12-04T10:32:19.4037581Z  * [new branch]                gh/nikitaved/16/base    -> origin/gh/nikitaved/16/base
2025-12-04T10:32:19.4037653Z  * [new branch]                gh/nikitaved/16/head    -> origin/gh/nikitaved/16/head
2025-12-04T10:32:19.4037726Z  * [new branch]                gh/nikitaved/16/orig    -> origin/gh/nikitaved/16/orig
2025-12-04T10:32:19.4037797Z  * [new branch]                gh/nikitaved/2/base     -> origin/gh/nikitaved/2/base
2025-12-04T10:32:19.4037870Z  * [new branch]                gh/nikitaved/2/head     -> origin/gh/nikitaved/2/head
2025-12-04T10:32:19.4037943Z  * [new branch]                gh/nikitaved/2/orig     -> origin/gh/nikitaved/2/orig
2025-12-04T10:32:19.4038013Z  * [new branch]                gh/nikitaved/4/base     -> origin/gh/nikitaved/4/base
2025-12-04T10:32:19.4038083Z  * [new branch]                gh/nikitaved/4/head     -> origin/gh/nikitaved/4/head
2025-12-04T10:32:19.4038155Z  * [new branch]                gh/nikitaved/4/orig     -> origin/gh/nikitaved/4/orig
2025-12-04T10:32:19.4038226Z  * [new branch]                gh/nikitaved/5/base     -> origin/gh/nikitaved/5/base
2025-12-04T10:32:19.4038296Z  * [new branch]                gh/nikitaved/5/head     -> origin/gh/nikitaved/5/head
2025-12-04T10:32:19.4038369Z  * [new branch]                gh/nikitaved/5/orig     -> origin/gh/nikitaved/5/orig
2025-12-04T10:32:19.4038439Z  * [new branch]                gh/nikitaved/6/base     -> origin/gh/nikitaved/6/base
2025-12-04T10:32:19.4038513Z  * [new branch]                gh/nikitaved/6/head     -> origin/gh/nikitaved/6/head
2025-12-04T10:32:19.4038583Z  * [new branch]                gh/nikitaved/6/orig     -> origin/gh/nikitaved/6/orig
2025-12-04T10:32:19.4038653Z  * [new branch]                gh/nikitaved/8/base     -> origin/gh/nikitaved/8/base
2025-12-04T10:32:19.4038725Z  * [new branch]                gh/nikitaved/8/head     -> origin/gh/nikitaved/8/head
2025-12-04T10:32:19.4038795Z  * [new branch]                gh/nikitaved/8/orig     -> origin/gh/nikitaved/8/orig
2025-12-04T10:32:19.4038896Z  * [new branch]                gh/nikitaved/9/base     -> origin/gh/nikitaved/9/base
2025-12-04T10:32:19.4038968Z  * [new branch]                gh/nikitaved/9/head     -> origin/gh/nikitaved/9/head
2025-12-04T10:32:19.4039039Z  * [new branch]                gh/nikitaved/9/orig     -> origin/gh/nikitaved/9/orig
2025-12-04T10:32:19.4039105Z  * [new branch]                gh/oulgen/10/base       -> origin/gh/oulgen/10/base
2025-12-04T10:32:19.4039175Z  * [new branch]                gh/oulgen/10/head       -> origin/gh/oulgen/10/head
2025-12-04T10:32:19.4039242Z  * [new branch]                gh/oulgen/10/orig       -> origin/gh/oulgen/10/orig
2025-12-04T10:32:19.4039307Z  * [new branch]                gh/oulgen/11/base       -> origin/gh/oulgen/11/base
2025-12-04T10:32:19.4039377Z  * [new branch]                gh/oulgen/11/head       -> origin/gh/oulgen/11/head
2025-12-04T10:32:19.4039443Z  * [new branch]                gh/oulgen/11/orig       -> origin/gh/oulgen/11/orig
2025-12-04T10:32:19.4039508Z  * [new branch]                gh/oulgen/12/base       -> origin/gh/oulgen/12/base
2025-12-04T10:32:19.4039611Z  * [new branch]                gh/oulgen/12/head       -> origin/gh/oulgen/12/head
2025-12-04T10:32:19.4039678Z  * [new branch]                gh/oulgen/12/orig       -> origin/gh/oulgen/12/orig
2025-12-04T10:32:19.4039744Z  * [new branch]                gh/oulgen/13/base       -> origin/gh/oulgen/13/base
2025-12-04T10:32:19.4039859Z  * [new branch]                gh/oulgen/13/head       -> origin/gh/oulgen/13/head
2025-12-04T10:32:19.4039926Z  * [new branch]                gh/oulgen/13/orig       -> origin/gh/oulgen/13/orig
2025-12-04T10:32:19.4039993Z  * [new branch]                gh/oulgen/14/base       -> origin/gh/oulgen/14/base
2025-12-04T10:32:19.4040061Z  * [new branch]                gh/oulgen/14/head       -> origin/gh/oulgen/14/head
2025-12-04T10:32:19.4040127Z  * [new branch]                gh/oulgen/14/orig       -> origin/gh/oulgen/14/orig
2025-12-04T10:32:19.4040193Z  * [new branch]                gh/oulgen/15/base       -> origin/gh/oulgen/15/base
2025-12-04T10:32:19.4040259Z  * [new branch]                gh/oulgen/15/head       -> origin/gh/oulgen/15/head
2025-12-04T10:32:19.4040324Z  * [new branch]                gh/oulgen/15/orig       -> origin/gh/oulgen/15/orig
2025-12-04T10:32:19.4040392Z  * [new branch]                gh/oulgen/16/base       -> origin/gh/oulgen/16/base
2025-12-04T10:32:19.4040458Z  * [new branch]                gh/oulgen/16/head       -> origin/gh/oulgen/16/head
2025-12-04T10:32:19.4040522Z  * [new branch]                gh/oulgen/16/orig       -> origin/gh/oulgen/16/orig
2025-12-04T10:32:19.4040589Z  * [new branch]                gh/oulgen/17/base       -> origin/gh/oulgen/17/base
2025-12-04T10:32:19.4040654Z  * [new branch]                gh/oulgen/17/head       -> origin/gh/oulgen/17/head
2025-12-04T10:32:19.4040720Z  * [new branch]                gh/oulgen/17/orig       -> origin/gh/oulgen/17/orig
2025-12-04T10:32:19.4040787Z  * [new branch]                gh/oulgen/18/base       -> origin/gh/oulgen/18/base
2025-12-04T10:32:19.4040854Z  * [new branch]                gh/oulgen/18/head       -> origin/gh/oulgen/18/head
2025-12-04T10:32:19.4040918Z  * [new branch]                gh/oulgen/18/orig       -> origin/gh/oulgen/18/orig
2025-12-04T10:32:19.4040986Z  * [new branch]                gh/oulgen/19/base       -> origin/gh/oulgen/19/base
2025-12-04T10:32:19.4041053Z  * [new branch]                gh/oulgen/19/head       -> origin/gh/oulgen/19/head
2025-12-04T10:32:19.4041119Z  * [new branch]                gh/oulgen/19/orig       -> origin/gh/oulgen/19/orig
2025-12-04T10:32:19.4041185Z  * [new branch]                gh/oulgen/20/base       -> origin/gh/oulgen/20/base
2025-12-04T10:32:19.4041249Z  * [new branch]                gh/oulgen/20/head       -> origin/gh/oulgen/20/head
2025-12-04T10:32:19.4041313Z  * [new branch]                gh/oulgen/20/orig       -> origin/gh/oulgen/20/orig
2025-12-04T10:32:19.4041382Z  * [new branch]                gh/oulgen/21/base       -> origin/gh/oulgen/21/base
2025-12-04T10:32:19.4041491Z  * [new branch]                gh/oulgen/21/head       -> origin/gh/oulgen/21/head
2025-12-04T10:32:19.4041557Z  * [new branch]                gh/oulgen/21/orig       -> origin/gh/oulgen/21/orig
2025-12-04T10:32:19.4041625Z  * [new branch]                gh/oulgen/22/base       -> origin/gh/oulgen/22/base
2025-12-04T10:32:19.4041693Z  * [new branch]                gh/oulgen/22/head       -> origin/gh/oulgen/22/head
2025-12-04T10:32:19.4041761Z  * [new branch]                gh/oulgen/22/orig       -> origin/gh/oulgen/22/orig
2025-12-04T10:32:19.4041826Z  * [new branch]                gh/oulgen/23/base       -> origin/gh/oulgen/23/base
2025-12-04T10:32:19.4041892Z  * [new branch]                gh/oulgen/23/head       -> origin/gh/oulgen/23/head
2025-12-04T10:32:19.4041960Z  * [new branch]                gh/oulgen/23/orig       -> origin/gh/oulgen/23/orig
2025-12-04T10:32:19.4042025Z  * [new branch]                gh/oulgen/24/base       -> origin/gh/oulgen/24/base
2025-12-04T10:32:19.4042094Z  * [new branch]                gh/oulgen/24/head       -> origin/gh/oulgen/24/head
2025-12-04T10:32:19.4042162Z  * [new branch]                gh/oulgen/24/orig       -> origin/gh/oulgen/24/orig
2025-12-04T10:32:19.4042228Z  * [new branch]                gh/oulgen/25/base       -> origin/gh/oulgen/25/base
2025-12-04T10:32:19.4042323Z  * [new branch]                gh/oulgen/25/head       -> origin/gh/oulgen/25/head
2025-12-04T10:32:19.4042390Z  * [new branch]                gh/oulgen/25/orig       -> origin/gh/oulgen/25/orig
2025-12-04T10:32:19.4042455Z  * [new branch]                gh/oulgen/26/base       -> origin/gh/oulgen/26/base
2025-12-04T10:32:19.4042520Z  * [new branch]                gh/oulgen/26/head       -> origin/gh/oulgen/26/head
2025-12-04T10:32:19.4042587Z  * [new branch]                gh/oulgen/26/orig       -> origin/gh/oulgen/26/orig
2025-12-04T10:32:19.4042654Z  * [new branch]                gh/oulgen/4/base        -> origin/gh/oulgen/4/base
2025-12-04T10:32:19.4042722Z  * [new branch]                gh/oulgen/4/head        -> origin/gh/oulgen/4/head
2025-12-04T10:32:19.4042789Z  * [new branch]                gh/oulgen/4/orig        -> origin/gh/oulgen/4/orig
2025-12-04T10:32:19.4042856Z  * [new branch]                gh/oulgen/7/base        -> origin/gh/oulgen/7/base
2025-12-04T10:32:19.4042921Z  * [new branch]                gh/oulgen/7/head        -> origin/gh/oulgen/7/head
2025-12-04T10:32:19.4042990Z  * [new branch]                gh/oulgen/7/orig        -> origin/gh/oulgen/7/orig
2025-12-04T10:32:19.4043054Z  * [new branch]                gh/oulgen/8/base        -> origin/gh/oulgen/8/base
2025-12-04T10:32:19.4043119Z  * [new branch]                gh/oulgen/8/head        -> origin/gh/oulgen/8/head
2025-12-04T10:32:19.4043187Z  * [new branch]                gh/oulgen/8/orig        -> origin/gh/oulgen/8/orig
2025-12-04T10:32:19.4043252Z  * [new branch]                gh/oulgen/9/base        -> origin/gh/oulgen/9/base
2025-12-04T10:32:19.4043318Z  * [new branch]                gh/oulgen/9/head        -> origin/gh/oulgen/9/head
2025-12-04T10:32:19.4043385Z  * [new branch]                gh/oulgen/9/orig        -> origin/gh/oulgen/9/orig
2025-12-04T10:32:19.4043489Z  * [new branch]                gh/patvig/mtia-serialization -> origin/gh/patvig/mtia-serialization
2025-12-04T10:32:19.4043558Z  * [new branch]                gh/pearu/108/base       -> origin/gh/pearu/108/base
2025-12-04T10:32:19.4043626Z  * [new branch]                gh/pearu/108/head       -> origin/gh/pearu/108/head
2025-12-04T10:32:19.4043692Z  * [new branch]                gh/pearu/108/orig       -> origin/gh/pearu/108/orig
2025-12-04T10:32:19.4043761Z  * [new branch]                gh/pearu/109/base       -> origin/gh/pearu/109/base
2025-12-04T10:32:19.4043825Z  * [new branch]                gh/pearu/109/head       -> origin/gh/pearu/109/head
2025-12-04T10:32:19.4043890Z  * [new branch]                gh/pearu/109/orig       -> origin/gh/pearu/109/orig
2025-12-04T10:32:19.4043988Z  * [new branch]                gh/pearu/110/base       -> origin/gh/pearu/110/base
2025-12-04T10:32:19.4044054Z  * [new branch]                gh/pearu/110/head       -> origin/gh/pearu/110/head
2025-12-04T10:32:19.4044120Z  * [new branch]                gh/pearu/110/orig       -> origin/gh/pearu/110/orig
2025-12-04T10:32:19.4044185Z  * [new branch]                gh/pearu/111/base       -> origin/gh/pearu/111/base
2025-12-04T10:32:19.4044252Z  * [new branch]                gh/pearu/111/head       -> origin/gh/pearu/111/head
2025-12-04T10:32:19.4044317Z  * [new branch]                gh/pearu/111/orig       -> origin/gh/pearu/111/orig
2025-12-04T10:32:19.4044385Z  * [new branch]                gh/pearu/112/base       -> origin/gh/pearu/112/base
2025-12-04T10:32:19.4044450Z  * [new branch]                gh/pearu/112/head       -> origin/gh/pearu/112/head
2025-12-04T10:32:19.4044515Z  * [new branch]                gh/pearu/112/orig       -> origin/gh/pearu/112/orig
2025-12-04T10:32:19.4044586Z  * [new branch]                gh/pearu/115/base       -> origin/gh/pearu/115/base
2025-12-04T10:32:19.4044651Z  * [new branch]                gh/pearu/115/head       -> origin/gh/pearu/115/head
2025-12-04T10:32:19.4044716Z  * [new branch]                gh/pearu/115/orig       -> origin/gh/pearu/115/orig
2025-12-04T10:32:19.4044782Z  * [new branch]                gh/pearu/116/base       -> origin/gh/pearu/116/base
2025-12-04T10:32:19.4044875Z  * [new branch]                gh/pearu/116/head       -> origin/gh/pearu/116/head
2025-12-04T10:32:19.4044942Z  * [new branch]                gh/pearu/116/orig       -> origin/gh/pearu/116/orig
2025-12-04T10:32:19.4045008Z  * [new branch]                gh/pearu/117/base       -> origin/gh/pearu/117/base
2025-12-04T10:32:19.4045074Z  * [new branch]                gh/pearu/117/head       -> origin/gh/pearu/117/head
2025-12-04T10:32:19.4045141Z  * [new branch]                gh/pearu/117/orig       -> origin/gh/pearu/117/orig
2025-12-04T10:32:19.4045206Z  * [new branch]                gh/pearu/118/base       -> origin/gh/pearu/118/base
2025-12-04T10:32:19.4045274Z  * [new branch]                gh/pearu/118/head       -> origin/gh/pearu/118/head
2025-12-04T10:32:19.4045341Z  * [new branch]                gh/pearu/118/orig       -> origin/gh/pearu/118/orig
2025-12-04T10:32:19.4045407Z  * [new branch]                gh/pearu/119/base       -> origin/gh/pearu/119/base
2025-12-04T10:32:19.4045475Z  * [new branch]                gh/pearu/119/head       -> origin/gh/pearu/119/head
2025-12-04T10:32:19.4045543Z  * [new branch]                gh/pearu/119/orig       -> origin/gh/pearu/119/orig
2025-12-04T10:32:19.4045609Z  * [new branch]                gh/pearu/139/base       -> origin/gh/pearu/139/base
2025-12-04T10:32:19.4045674Z  * [new branch]                gh/pearu/139/head       -> origin/gh/pearu/139/head
2025-12-04T10:32:19.4045742Z  * [new branch]                gh/pearu/139/orig       -> origin/gh/pearu/139/orig
2025-12-04T10:32:19.4045809Z  * [new branch]                gh/pearu/140/base       -> origin/gh/pearu/140/base
2025-12-04T10:32:19.4045877Z  * [new branch]                gh/pearu/140/head       -> origin/gh/pearu/140/head
2025-12-04T10:32:19.4045944Z  * [new branch]                gh/pearu/140/orig       -> origin/gh/pearu/140/orig
2025-12-04T10:32:19.4046008Z  * [new branch]                gh/pearu/142/base       -> origin/gh/pearu/142/base
2025-12-04T10:32:19.4046075Z  * [new branch]                gh/pearu/142/head       -> origin/gh/pearu/142/head
2025-12-04T10:32:19.4046142Z  * [new branch]                gh/pearu/142/orig       -> origin/gh/pearu/142/orig
2025-12-04T10:32:19.4046207Z  * [new branch]                gh/pearu/143/base       -> origin/gh/pearu/143/base
2025-12-04T10:32:19.4046275Z  * [new branch]                gh/pearu/143/head       -> origin/gh/pearu/143/head
2025-12-04T10:32:19.4046342Z  * [new branch]                gh/pearu/143/orig       -> origin/gh/pearu/143/orig
2025-12-04T10:32:19.4046408Z  * [new branch]                gh/pearu/147/base       -> origin/gh/pearu/147/base
2025-12-04T10:32:19.4046510Z  * [new branch]                gh/pearu/147/head       -> origin/gh/pearu/147/head
2025-12-04T10:32:19.4046577Z  * [new branch]                gh/pearu/147/orig       -> origin/gh/pearu/147/orig
2025-12-04T10:32:19.4046642Z  * [new branch]                gh/pearu/149/base       -> origin/gh/pearu/149/base
2025-12-04T10:32:19.4046710Z  * [new branch]                gh/pearu/149/head       -> origin/gh/pearu/149/head
2025-12-04T10:32:19.4046775Z  * [new branch]                gh/pearu/149/orig       -> origin/gh/pearu/149/orig
2025-12-04T10:32:19.4046842Z  * [new branch]                gh/pearu/150/base       -> origin/gh/pearu/150/base
2025-12-04T10:32:19.4046909Z  * [new branch]                gh/pearu/150/head       -> origin/gh/pearu/150/head
2025-12-04T10:32:19.4046974Z  * [new branch]                gh/pearu/150/orig       -> origin/gh/pearu/150/orig
2025-12-04T10:32:19.4047039Z  * [new branch]                gh/pearu/151/base       -> origin/gh/pearu/151/base
2025-12-04T10:32:19.4047109Z  * [new branch]                gh/pearu/151/head       -> origin/gh/pearu/151/head
2025-12-04T10:32:19.4047174Z  * [new branch]                gh/pearu/151/orig       -> origin/gh/pearu/151/orig
2025-12-04T10:32:19.4047240Z  * [new branch]                gh/pearu/152/base       -> origin/gh/pearu/152/base
2025-12-04T10:32:19.4047308Z  * [new branch]                gh/pearu/152/head       -> origin/gh/pearu/152/head
2025-12-04T10:32:19.4047405Z  * [new branch]                gh/pearu/152/orig       -> origin/gh/pearu/152/orig
2025-12-04T10:32:19.4047471Z  * [new branch]                gh/pearu/153/base       -> origin/gh/pearu/153/base
2025-12-04T10:32:19.4047539Z  * [new branch]                gh/pearu/153/head       -> origin/gh/pearu/153/head
2025-12-04T10:32:19.4047604Z  * [new branch]                gh/pearu/153/orig       -> origin/gh/pearu/153/orig
2025-12-04T10:32:19.4047670Z  * [new branch]                gh/pearu/154/base       -> origin/gh/pearu/154/base
2025-12-04T10:32:19.4047739Z  * [new branch]                gh/pearu/154/head       -> origin/gh/pearu/154/head
2025-12-04T10:32:19.4047804Z  * [new branch]                gh/pearu/154/orig       -> origin/gh/pearu/154/orig
2025-12-04T10:32:19.4047870Z  * [new branch]                gh/pearu/155/base       -> origin/gh/pearu/155/base
2025-12-04T10:32:19.4047938Z  * [new branch]                gh/pearu/155/head       -> origin/gh/pearu/155/head
2025-12-04T10:32:19.4048006Z  * [new branch]                gh/pearu/155/orig       -> origin/gh/pearu/155/orig
2025-12-04T10:32:19.4048072Z  * [new branch]                gh/pearu/156/base       -> origin/gh/pearu/156/base
2025-12-04T10:32:19.4048141Z  * [new branch]                gh/pearu/156/head       -> origin/gh/pearu/156/head
2025-12-04T10:32:19.4054013Z  * [new branch]                gh/pearu/156/orig       -> origin/gh/pearu/156/orig
2025-12-04T10:32:19.4054096Z  * [new branch]                gh/pearu/56/base        -> origin/gh/pearu/56/base
2025-12-04T10:32:19.4054170Z  * [new branch]                gh/pearu/56/head        -> origin/gh/pearu/56/head
2025-12-04T10:32:19.4054236Z  * [new branch]                gh/pearu/56/orig        -> origin/gh/pearu/56/orig
2025-12-04T10:32:19.4054301Z  * [new branch]                gh/pearu/97/base        -> origin/gh/pearu/97/base
2025-12-04T10:32:19.4054367Z  * [new branch]                gh/pearu/97/head        -> origin/gh/pearu/97/head
2025-12-04T10:32:19.4054437Z  * [new branch]                gh/pearu/97/orig        -> origin/gh/pearu/97/orig
2025-12-04T10:32:19.4054510Z  * [new branch]                gh/pianpwk/21/base      -> origin/gh/pianpwk/21/base
2025-12-04T10:32:19.4054583Z  * [new branch]                gh/pianpwk/21/head      -> origin/gh/pianpwk/21/head
2025-12-04T10:32:19.4054656Z  * [new branch]                gh/pianpwk/28/base      -> origin/gh/pianpwk/28/base
2025-12-04T10:32:19.4054724Z  * [new branch]                gh/pianpwk/28/head      -> origin/gh/pianpwk/28/head
2025-12-04T10:32:19.4054854Z  * [new branch]                gh/pianpwk/28/orig      -> origin/gh/pianpwk/28/orig
2025-12-04T10:32:19.4054923Z  * [new branch]                gh/pianpwk/29/base      -> origin/gh/pianpwk/29/base
2025-12-04T10:32:19.4054991Z  * [new branch]                gh/pianpwk/29/head      -> origin/gh/pianpwk/29/head
2025-12-04T10:32:19.4055060Z  * [new branch]                gh/pianpwk/29/orig      -> origin/gh/pianpwk/29/orig
2025-12-04T10:32:19.4055130Z  * [new branch]                gh/pianpwk/30/base      -> origin/gh/pianpwk/30/base
2025-12-04T10:32:19.4055198Z  * [new branch]                gh/pianpwk/30/head      -> origin/gh/pianpwk/30/head
2025-12-04T10:32:19.4055268Z  * [new branch]                gh/pianpwk/30/orig      -> origin/gh/pianpwk/30/orig
2025-12-04T10:32:19.4055337Z  * [new branch]                gh/pianpwk/31/base      -> origin/gh/pianpwk/31/base
2025-12-04T10:32:19.4055406Z  * [new branch]                gh/pianpwk/31/head      -> origin/gh/pianpwk/31/head
2025-12-04T10:32:19.4055475Z  * [new branch]                gh/pianpwk/31/orig      -> origin/gh/pianpwk/31/orig
2025-12-04T10:32:19.4055546Z  * [new branch]                gh/pianpwk/32/base      -> origin/gh/pianpwk/32/base
2025-12-04T10:32:19.4055615Z  * [new branch]                gh/pianpwk/32/head      -> origin/gh/pianpwk/32/head
2025-12-04T10:32:19.4055684Z  * [new branch]                gh/pianpwk/32/orig      -> origin/gh/pianpwk/32/orig
2025-12-04T10:32:19.4055806Z  * [new branch]                gh/pianpwk/33/base      -> origin/gh/pianpwk/33/base
2025-12-04T10:32:19.4055876Z  * [new branch]                gh/pianpwk/33/head      -> origin/gh/pianpwk/33/head
2025-12-04T10:32:19.4055945Z  * [new branch]                gh/pianpwk/33/orig      -> origin/gh/pianpwk/33/orig
2025-12-04T10:32:19.4056015Z  * [new branch]                gh/pianpwk/34/base      -> origin/gh/pianpwk/34/base
2025-12-04T10:32:19.4056085Z  * [new branch]                gh/pianpwk/34/head      -> origin/gh/pianpwk/34/head
2025-12-04T10:32:19.4056153Z  * [new branch]                gh/pianpwk/34/orig      -> origin/gh/pianpwk/34/orig
2025-12-04T10:32:19.4056227Z  * [new branch]                gh/pianpwk/35/base      -> origin/gh/pianpwk/35/base
2025-12-04T10:32:19.4056297Z  * [new branch]                gh/pianpwk/35/head      -> origin/gh/pianpwk/35/head
2025-12-04T10:32:19.4056366Z  * [new branch]                gh/pianpwk/35/orig      -> origin/gh/pianpwk/35/orig
2025-12-04T10:32:19.4056435Z  * [new branch]                gh/rec/141/base         -> origin/gh/rec/141/base
2025-12-04T10:32:19.4056503Z  * [new branch]                gh/rec/141/head         -> origin/gh/rec/141/head
2025-12-04T10:32:19.4056570Z  * [new branch]                gh/rec/153/base         -> origin/gh/rec/153/base
2025-12-04T10:32:19.4056637Z  * [new branch]                gh/rec/153/head         -> origin/gh/rec/153/head
2025-12-04T10:32:19.4056701Z  * [new branch]                gh/rec/153/orig         -> origin/gh/rec/153/orig
2025-12-04T10:32:19.4056763Z  * [new branch]                gh/rec/154/base         -> origin/gh/rec/154/base
2025-12-04T10:32:19.4056827Z  * [new branch]                gh/rec/154/head         -> origin/gh/rec/154/head
2025-12-04T10:32:19.4056890Z  * [new branch]                gh/rec/154/orig         -> origin/gh/rec/154/orig
2025-12-04T10:32:19.4056954Z  * [new branch]                gh/rec/164/base         -> origin/gh/rec/164/base
2025-12-04T10:32:19.4057018Z  * [new branch]                gh/rec/164/head         -> origin/gh/rec/164/head
2025-12-04T10:32:19.4057084Z  * [new branch]                gh/rec/164/orig         -> origin/gh/rec/164/orig
2025-12-04T10:32:19.4057148Z  * [new branch]                gh/rec/166/base         -> origin/gh/rec/166/base
2025-12-04T10:32:19.4057211Z  * [new branch]                gh/rec/166/head         -> origin/gh/rec/166/head
2025-12-04T10:32:19.4057275Z  * [new branch]                gh/rec/166/orig         -> origin/gh/rec/166/orig
2025-12-04T10:32:19.4057338Z  * [new branch]                gh/rec/167/base         -> origin/gh/rec/167/base
2025-12-04T10:32:19.4057434Z  * [new branch]                gh/rec/167/head         -> origin/gh/rec/167/head
2025-12-04T10:32:19.4057498Z  * [new branch]                gh/rec/167/orig         -> origin/gh/rec/167/orig
2025-12-04T10:32:19.4057560Z  * [new branch]                gh/rec/168/base         -> origin/gh/rec/168/base
2025-12-04T10:32:19.4057627Z  * [new branch]                gh/rec/168/head         -> origin/gh/rec/168/head
2025-12-04T10:32:19.4057691Z  * [new branch]                gh/rec/168/orig         -> origin/gh/rec/168/orig
2025-12-04T10:32:19.4057754Z  * [new branch]                gh/rec/169/base         -> origin/gh/rec/169/base
2025-12-04T10:32:19.4057822Z  * [new branch]                gh/rec/169/head         -> origin/gh/rec/169/head
2025-12-04T10:32:19.4057884Z  * [new branch]                gh/rec/169/orig         -> origin/gh/rec/169/orig
2025-12-04T10:32:19.4057947Z  * [new branch]                gh/rec/170/base         -> origin/gh/rec/170/base
2025-12-04T10:32:19.4058012Z  * [new branch]                gh/rec/170/head         -> origin/gh/rec/170/head
2025-12-04T10:32:19.4058076Z  * [new branch]                gh/rec/170/orig         -> origin/gh/rec/170/orig
2025-12-04T10:32:19.4058137Z  * [new branch]                gh/rec/171/base         -> origin/gh/rec/171/base
2025-12-04T10:32:19.4058200Z  * [new branch]                gh/rec/171/head         -> origin/gh/rec/171/head
2025-12-04T10:32:19.4058292Z  * [new branch]                gh/rec/171/orig         -> origin/gh/rec/171/orig
2025-12-04T10:32:19.4058356Z  * [new branch]                gh/rec/172/base         -> origin/gh/rec/172/base
2025-12-04T10:32:19.4058418Z  * [new branch]                gh/rec/172/head         -> origin/gh/rec/172/head
2025-12-04T10:32:19.4058480Z  * [new branch]                gh/rec/172/orig         -> origin/gh/rec/172/orig
2025-12-04T10:32:19.4058541Z  * [new branch]                gh/rec/173/base         -> origin/gh/rec/173/base
2025-12-04T10:32:19.4058607Z  * [new branch]                gh/rec/173/head         -> origin/gh/rec/173/head
2025-12-04T10:32:19.4058671Z  * [new branch]                gh/rec/173/orig         -> origin/gh/rec/173/orig
2025-12-04T10:32:19.4058733Z  * [new branch]                gh/rec/174/base         -> origin/gh/rec/174/base
2025-12-04T10:32:19.4058797Z  * [new branch]                gh/rec/174/head         -> origin/gh/rec/174/head
2025-12-04T10:32:19.4058860Z  * [new branch]                gh/rec/174/orig         -> origin/gh/rec/174/orig
2025-12-04T10:32:19.4058922Z  * [new branch]                gh/rec/175/base         -> origin/gh/rec/175/base
2025-12-04T10:32:19.4058985Z  * [new branch]                gh/rec/175/head         -> origin/gh/rec/175/head
2025-12-04T10:32:19.4059049Z  * [new branch]                gh/rec/175/orig         -> origin/gh/rec/175/orig
2025-12-04T10:32:19.4059111Z  * [new branch]                gh/rec/176/base         -> origin/gh/rec/176/base
2025-12-04T10:32:19.4059173Z  * [new branch]                gh/rec/176/head         -> origin/gh/rec/176/head
2025-12-04T10:32:19.4059237Z  * [new branch]                gh/rec/176/orig         -> origin/gh/rec/176/orig
2025-12-04T10:32:19.4059300Z  * [new branch]                gh/rec/177/base         -> origin/gh/rec/177/base
2025-12-04T10:32:19.4059362Z  * [new branch]                gh/rec/177/head         -> origin/gh/rec/177/head
2025-12-04T10:32:19.4059423Z  * [new branch]                gh/rec/177/orig         -> origin/gh/rec/177/orig
2025-12-04T10:32:19.4059515Z  * [new branch]                gh/robert-hardwick/3/base -> origin/gh/robert-hardwick/3/base
2025-12-04T10:32:19.4059635Z  * [new branch]                gh/robert-hardwick/3/head -> origin/gh/robert-hardwick/3/head
2025-12-04T10:32:19.4059718Z  * [new branch]                gh/robert-hardwick/3/orig -> origin/gh/robert-hardwick/3/orig
2025-12-04T10:32:19.4059801Z  * [new branch]                gh/robert-hardwick/4/base -> origin/gh/robert-hardwick/4/base
2025-12-04T10:32:19.4059880Z  * [new branch]                gh/robert-hardwick/4/head -> origin/gh/robert-hardwick/4/head
2025-12-04T10:32:19.4060006Z  * [new branch]                gh/robert-hardwick/4/orig -> origin/gh/robert-hardwick/4/orig
2025-12-04T10:32:19.4060089Z  * [new branch]                gh/robert-hardwick/5/base -> origin/gh/robert-hardwick/5/base
2025-12-04T10:32:19.4060169Z  * [new branch]                gh/robert-hardwick/5/head -> origin/gh/robert-hardwick/5/head
2025-12-04T10:32:19.4060250Z  * [new branch]                gh/robert-hardwick/5/orig -> origin/gh/robert-hardwick/5/orig
2025-12-04T10:32:19.4060335Z  * [new branch]                gh/robert-hardwick/6/base -> origin/gh/robert-hardwick/6/base
2025-12-04T10:32:19.4060415Z  * [new branch]                gh/robert-hardwick/6/head -> origin/gh/robert-hardwick/6/head
2025-12-04T10:32:19.4060496Z  * [new branch]                gh/robert-hardwick/6/orig -> origin/gh/robert-hardwick/6/orig
2025-12-04T10:32:19.4060578Z  * [new branch]                gh/robert-hardwick/7/base -> origin/gh/robert-hardwick/7/base
2025-12-04T10:32:19.4060660Z  * [new branch]                gh/robert-hardwick/7/head -> origin/gh/robert-hardwick/7/head
2025-12-04T10:32:19.4060742Z  * [new branch]                gh/robert-hardwick/7/orig -> origin/gh/robert-hardwick/7/orig
2025-12-04T10:32:19.4060824Z  * [new branch]                gh/robert-hardwick/8/base -> origin/gh/robert-hardwick/8/base
2025-12-04T10:32:19.4060958Z  * [new branch]                gh/robert-hardwick/8/head -> origin/gh/robert-hardwick/8/head
2025-12-04T10:32:19.4061040Z  * [new branch]                gh/robert-hardwick/8/orig -> origin/gh/robert-hardwick/8/orig
2025-12-04T10:32:19.4061120Z  * [new branch]                gh/robert-hardwick/9/base -> origin/gh/robert-hardwick/9/base
2025-12-04T10:32:19.4061199Z  * [new branch]                gh/robert-hardwick/9/head -> origin/gh/robert-hardwick/9/head
2025-12-04T10:32:19.4061280Z  * [new branch]                gh/robert-hardwick/9/orig -> origin/gh/robert-hardwick/9/orig
2025-12-04T10:32:19.4061351Z  * [new branch]                gh/rtimpe/1/base        -> origin/gh/rtimpe/1/base
2025-12-04T10:32:19.4061420Z  * [new branch]                gh/rtimpe/1/head        -> origin/gh/rtimpe/1/head
2025-12-04T10:32:19.4061488Z  * [new branch]                gh/rtimpe/2/base        -> origin/gh/rtimpe/2/base
2025-12-04T10:32:19.4061553Z  * [new branch]                gh/rtimpe/2/head        -> origin/gh/rtimpe/2/head
2025-12-04T10:32:19.4061623Z  * [new branch]                gh/rtimpe/22/base       -> origin/gh/rtimpe/22/base
2025-12-04T10:32:19.4061690Z  * [new branch]                gh/rtimpe/22/head       -> origin/gh/rtimpe/22/head
2025-12-04T10:32:19.4061756Z  * [new branch]                gh/rtimpe/22/orig       -> origin/gh/rtimpe/22/orig
2025-12-04T10:32:19.4061821Z  * [new branch]                gh/rtimpe/23/base       -> origin/gh/rtimpe/23/base
2025-12-04T10:32:19.4061889Z  * [new branch]                gh/rtimpe/23/head       -> origin/gh/rtimpe/23/head
2025-12-04T10:32:19.4061955Z  * [new branch]                gh/rtimpe/23/orig       -> origin/gh/rtimpe/23/orig
2025-12-04T10:32:19.4062022Z  * [new branch]                gh/rtimpe/24/base       -> origin/gh/rtimpe/24/base
2025-12-04T10:32:19.4062090Z  * [new branch]                gh/rtimpe/24/head       -> origin/gh/rtimpe/24/head
2025-12-04T10:32:19.4062156Z  * [new branch]                gh/rtimpe/24/orig       -> origin/gh/rtimpe/24/orig
2025-12-04T10:32:19.4062223Z  * [new branch]                gh/rtimpe/25/base       -> origin/gh/rtimpe/25/base
2025-12-04T10:32:19.4062291Z  * [new branch]                gh/rtimpe/25/head       -> origin/gh/rtimpe/25/head
2025-12-04T10:32:19.4062357Z  * [new branch]                gh/rtimpe/25/orig       -> origin/gh/rtimpe/25/orig
2025-12-04T10:32:19.4062424Z  * [new branch]                gh/rtimpe/26/base       -> origin/gh/rtimpe/26/base
2025-12-04T10:32:19.4062490Z  * [new branch]                gh/rtimpe/26/head       -> origin/gh/rtimpe/26/head
2025-12-04T10:32:19.4062555Z  * [new branch]                gh/rtimpe/26/orig       -> origin/gh/rtimpe/26/orig
2025-12-04T10:32:19.4062661Z  * [new branch]                gh/rtimpe/27/base       -> origin/gh/rtimpe/27/base
2025-12-04T10:32:19.4062727Z  * [new branch]                gh/rtimpe/27/head       -> origin/gh/rtimpe/27/head
2025-12-04T10:32:19.4062793Z  * [new branch]                gh/rtimpe/27/orig       -> origin/gh/rtimpe/27/orig
2025-12-04T10:32:19.4062858Z  * [new branch]                gh/rtimpe/28/base       -> origin/gh/rtimpe/28/base
2025-12-04T10:32:19.4062924Z  * [new branch]                gh/rtimpe/28/head       -> origin/gh/rtimpe/28/head
2025-12-04T10:32:19.4062990Z  * [new branch]                gh/rtimpe/28/orig       -> origin/gh/rtimpe/28/orig
2025-12-04T10:32:19.4063057Z  * [new branch]                gh/rtimpe/29/base       -> origin/gh/rtimpe/29/base
2025-12-04T10:32:19.4063122Z  * [new branch]                gh/rtimpe/29/head       -> origin/gh/rtimpe/29/head
2025-12-04T10:32:19.4063187Z  * [new branch]                gh/rtimpe/29/orig       -> origin/gh/rtimpe/29/orig
2025-12-04T10:32:19.4063257Z  * [new branch]                gh/rtimpe/3/base        -> origin/gh/rtimpe/3/base
2025-12-04T10:32:19.4063322Z  * [new branch]                gh/rtimpe/3/head        -> origin/gh/rtimpe/3/head
2025-12-04T10:32:19.4063387Z  * [new branch]                gh/rtimpe/30/base       -> origin/gh/rtimpe/30/base
2025-12-04T10:32:19.4063454Z  * [new branch]                gh/rtimpe/30/head       -> origin/gh/rtimpe/30/head
2025-12-04T10:32:19.4063546Z  * [new branch]                gh/rtimpe/30/orig       -> origin/gh/rtimpe/30/orig
2025-12-04T10:32:19.4063612Z  * [new branch]                gh/rtimpe/31/base       -> origin/gh/rtimpe/31/base
2025-12-04T10:32:19.4063678Z  * [new branch]                gh/rtimpe/31/head       -> origin/gh/rtimpe/31/head
2025-12-04T10:32:19.4063743Z  * [new branch]                gh/rtimpe/31/orig       -> origin/gh/rtimpe/31/orig
2025-12-04T10:32:19.4063807Z  * [new branch]                gh/rtimpe/32/base       -> origin/gh/rtimpe/32/base
2025-12-04T10:32:19.4063874Z  * [new branch]                gh/rtimpe/32/head       -> origin/gh/rtimpe/32/head
2025-12-04T10:32:19.4063939Z  * [new branch]                gh/rtimpe/32/orig       -> origin/gh/rtimpe/32/orig
2025-12-04T10:32:19.4064005Z  * [new branch]                gh/rtimpe/33/base       -> origin/gh/rtimpe/33/base
2025-12-04T10:32:19.4064070Z  * [new branch]                gh/rtimpe/33/head       -> origin/gh/rtimpe/33/head
2025-12-04T10:32:19.4064137Z  * [new branch]                gh/rtimpe/33/orig       -> origin/gh/rtimpe/33/orig
2025-12-04T10:32:19.4064204Z  * [new branch]                gh/rtimpe/34/base       -> origin/gh/rtimpe/34/base
2025-12-04T10:32:19.4064269Z  * [new branch]                gh/rtimpe/34/head       -> origin/gh/rtimpe/34/head
2025-12-04T10:32:19.4064334Z  * [new branch]                gh/rtimpe/34/orig       -> origin/gh/rtimpe/34/orig
2025-12-04T10:32:19.4064400Z  * [new branch]                gh/rtimpe/35/base       -> origin/gh/rtimpe/35/base
2025-12-04T10:32:19.4064467Z  * [new branch]                gh/rtimpe/35/head       -> origin/gh/rtimpe/35/head
2025-12-04T10:32:19.4064533Z  * [new branch]                gh/rtimpe/35/orig       -> origin/gh/rtimpe/35/orig
2025-12-04T10:32:19.4064600Z  * [new branch]                gh/rtimpe/4/base        -> origin/gh/rtimpe/4/base
2025-12-04T10:32:19.4064667Z  * [new branch]                gh/rtimpe/4/head        -> origin/gh/rtimpe/4/head
2025-12-04T10:32:19.4064750Z  * [new branch]                gh/ruisizhang123/1/base -> origin/gh/ruisizhang123/1/base
2025-12-04T10:32:19.4064832Z  * [new branch]                gh/ruisizhang123/1/head -> origin/gh/ruisizhang123/1/head
2025-12-04T10:32:19.4064908Z  * [new branch]                gh/ruisizhang123/1/orig -> origin/gh/ruisizhang123/1/orig
2025-12-04T10:32:19.4064984Z  * [new branch]                gh/ruisizhang123/4/base -> origin/gh/ruisizhang123/4/base
2025-12-04T10:32:19.4065063Z  * [new branch]                gh/ruisizhang123/4/head -> origin/gh/ruisizhang123/4/head
2025-12-04T10:32:19.4065170Z  * [new branch]                gh/ruisizhang123/4/orig -> origin/gh/ruisizhang123/4/orig
2025-12-04T10:32:19.4065244Z  * [new branch]                gh/ruisizhang123/5/base -> origin/gh/ruisizhang123/5/base
2025-12-04T10:32:19.4065321Z  * [new branch]                gh/ruisizhang123/5/head -> origin/gh/ruisizhang123/5/head
2025-12-04T10:32:19.4065397Z  * [new branch]                gh/ruisizhang123/5/orig -> origin/gh/ruisizhang123/5/orig
2025-12-04T10:32:19.4065473Z  * [new branch]                gh/ruisizhang123/6/base -> origin/gh/ruisizhang123/6/base
2025-12-04T10:32:19.4065551Z  * [new branch]                gh/ruisizhang123/6/head -> origin/gh/ruisizhang123/6/head
2025-12-04T10:32:19.4065625Z  * [new branch]                gh/ruisizhang123/6/orig -> origin/gh/ruisizhang123/6/orig
2025-12-04T10:32:19.4065702Z  * [new branch]                gh/ruisizhang123/7/base -> origin/gh/ruisizhang123/7/base
2025-12-04T10:32:19.4065776Z  * [new branch]                gh/ruisizhang123/7/head -> origin/gh/ruisizhang123/7/head
2025-12-04T10:32:19.4065854Z  * [new branch]                gh/ruisizhang123/7/orig -> origin/gh/ruisizhang123/7/orig
2025-12-04T10:32:19.4065929Z  * [new branch]                gh/ruisizhang123/8/base -> origin/gh/ruisizhang123/8/base
2025-12-04T10:32:19.4066004Z  * [new branch]                gh/ruisizhang123/8/head -> origin/gh/ruisizhang123/8/head
2025-12-04T10:32:19.4066105Z  * [new branch]                gh/ruisizhang123/8/orig -> origin/gh/ruisizhang123/8/orig
2025-12-04T10:32:19.4066182Z  * [new branch]                gh/ruisizhang123/9/base -> origin/gh/ruisizhang123/9/base
2025-12-04T10:32:19.4066258Z  * [new branch]                gh/ruisizhang123/9/head -> origin/gh/ruisizhang123/9/head
2025-12-04T10:32:19.4066332Z  * [new branch]                gh/ruisizhang123/9/orig -> origin/gh/ruisizhang123/9/orig
2025-12-04T10:32:19.4066412Z  * [new branch]                gh/seemethere/52/base   -> origin/gh/seemethere/52/base
2025-12-04T10:32:19.4066485Z  * [new branch]                gh/seemethere/52/head   -> origin/gh/seemethere/52/head
2025-12-04T10:32:19.4066561Z  * [new branch]                gh/seemethere/52/orig   -> origin/gh/seemethere/52/orig
2025-12-04T10:32:19.4066636Z  * [new branch]                gh/seemethere/53/base   -> origin/gh/seemethere/53/base
2025-12-04T10:32:19.4066711Z  * [new branch]                gh/seemethere/53/head   -> origin/gh/seemethere/53/head
2025-12-04T10:32:19.4066786Z  * [new branch]                gh/seemethere/53/orig   -> origin/gh/seemethere/53/orig
2025-12-04T10:32:19.4066860Z  * [new branch]                gh/seemethere/54/base   -> origin/gh/seemethere/54/base
2025-12-04T10:32:19.4066931Z  * [new branch]                gh/seemethere/54/head   -> origin/gh/seemethere/54/head
2025-12-04T10:32:19.4067003Z  * [new branch]                gh/seemethere/54/orig   -> origin/gh/seemethere/54/orig
2025-12-04T10:32:19.4067076Z  * [new branch]                gh/seemethere/55/base   -> origin/gh/seemethere/55/base
2025-12-04T10:32:19.4067148Z  * [new branch]                gh/seemethere/55/head   -> origin/gh/seemethere/55/head
2025-12-04T10:32:19.4067225Z  * [new branch]                gh/seemethere/55/orig   -> origin/gh/seemethere/55/orig
2025-12-04T10:32:19.4067298Z  * [new branch]                gh/seemethere/59/base   -> origin/gh/seemethere/59/base
2025-12-04T10:32:19.4067369Z  * [new branch]                gh/seemethere/59/head   -> origin/gh/seemethere/59/head
2025-12-04T10:32:19.4067445Z  * [new branch]                gh/seemethere/59/orig   -> origin/gh/seemethere/59/orig
2025-12-04T10:32:19.4067516Z  * [new branch]                gh/seemethere/62/base   -> origin/gh/seemethere/62/base
2025-12-04T10:32:19.4067588Z  * [new branch]                gh/seemethere/62/head   -> origin/gh/seemethere/62/head
2025-12-04T10:32:19.4067661Z  * [new branch]                gh/seemethere/62/orig   -> origin/gh/seemethere/62/orig
2025-12-04T10:32:19.4067733Z  * [new branch]                gh/seemethere/63/base   -> origin/gh/seemethere/63/base
2025-12-04T10:32:19.4067838Z  * [new branch]                gh/seemethere/63/head   -> origin/gh/seemethere/63/head
2025-12-04T10:32:19.4067911Z  * [new branch]                gh/seemethere/63/orig   -> origin/gh/seemethere/63/orig
2025-12-04T10:32:19.4067983Z  * [new branch]                gh/seemethere/71/base   -> origin/gh/seemethere/71/base
2025-12-04T10:32:19.4068054Z  * [new branch]                gh/seemethere/71/head   -> origin/gh/seemethere/71/head
2025-12-04T10:32:19.4068129Z  * [new branch]                gh/seemethere/71/orig   -> origin/gh/seemethere/71/orig
2025-12-04T10:32:19.4068201Z  * [new branch]                gh/seemethere/72/base   -> origin/gh/seemethere/72/base
2025-12-04T10:32:19.4068272Z  * [new branch]                gh/seemethere/72/head   -> origin/gh/seemethere/72/head
2025-12-04T10:32:19.4068346Z  * [new branch]                gh/seemethere/72/orig   -> origin/gh/seemethere/72/orig
2025-12-04T10:32:19.4068419Z  * [new branch]                gh/seemethere/73/base   -> origin/gh/seemethere/73/base
2025-12-04T10:32:19.4068492Z  * [new branch]                gh/seemethere/73/head   -> origin/gh/seemethere/73/head
2025-12-04T10:32:19.4068565Z  * [new branch]                gh/seemethere/73/orig   -> origin/gh/seemethere/73/orig
2025-12-04T10:32:19.4068637Z  * [new branch]                gh/seemethere/74/base   -> origin/gh/seemethere/74/base
2025-12-04T10:32:19.4068712Z  * [new branch]                gh/seemethere/74/head   -> origin/gh/seemethere/74/head
2025-12-04T10:32:19.4068811Z  * [new branch]                gh/seemethere/74/orig   -> origin/gh/seemethere/74/orig
2025-12-04T10:32:19.4068883Z  * [new branch]                gh/seemethere/75/base   -> origin/gh/seemethere/75/base
2025-12-04T10:32:19.4068957Z  * [new branch]                gh/seemethere/75/head   -> origin/gh/seemethere/75/head
2025-12-04T10:32:19.4069029Z  * [new branch]                gh/seemethere/75/orig   -> origin/gh/seemethere/75/orig
2025-12-04T10:32:19.4069101Z  * [new branch]                gh/seemethere/76/base   -> origin/gh/seemethere/76/base
2025-12-04T10:32:19.4069175Z  * [new branch]                gh/seemethere/76/head   -> origin/gh/seemethere/76/head
2025-12-04T10:32:19.4069249Z  * [new branch]                gh/seemethere/76/orig   -> origin/gh/seemethere/76/orig
2025-12-04T10:32:19.4069325Z  * [new branch]                gh/shunting314/145/base -> origin/gh/shunting314/145/base
2025-12-04T10:32:19.4069402Z  * [new branch]                gh/shunting314/145/head -> origin/gh/shunting314/145/head
2025-12-04T10:32:19.4069477Z  * [new branch]                gh/shunting314/145/orig -> origin/gh/shunting314/145/orig
2025-12-04T10:32:19.4069552Z  * [new branch]                gh/shunting314/176/base -> origin/gh/shunting314/176/base
2025-12-04T10:32:19.4069676Z  * [new branch]                gh/shunting314/176/head -> origin/gh/shunting314/176/head
2025-12-04T10:32:19.4069752Z  * [new branch]                gh/shunting314/176/orig -> origin/gh/shunting314/176/orig
2025-12-04T10:32:19.4069825Z  * [new branch]                gh/shunting314/249/base -> origin/gh/shunting314/249/base
2025-12-04T10:32:19.4069901Z  * [new branch]                gh/shunting314/249/head -> origin/gh/shunting314/249/head
2025-12-04T10:32:19.4069975Z  * [new branch]                gh/shunting314/249/orig -> origin/gh/shunting314/249/orig
2025-12-04T10:32:19.4070049Z  * [new branch]                gh/shunting314/253/base -> origin/gh/shunting314/253/base
2025-12-04T10:32:19.4070125Z  * [new branch]                gh/shunting314/253/head -> origin/gh/shunting314/253/head
2025-12-04T10:32:19.4070199Z  * [new branch]                gh/shunting314/253/orig -> origin/gh/shunting314/253/orig
2025-12-04T10:32:19.4070273Z  * [new branch]                gh/shunting314/256/base -> origin/gh/shunting314/256/base
2025-12-04T10:32:19.4070350Z  * [new branch]                gh/shunting314/256/head -> origin/gh/shunting314/256/head
2025-12-04T10:32:19.4070424Z  * [new branch]                gh/shunting314/256/orig -> origin/gh/shunting314/256/orig
2025-12-04T10:32:19.4070550Z  * [new branch]                gh/shunting314/257/base -> origin/gh/shunting314/257/base
2025-12-04T10:32:19.4070623Z  * [new branch]                gh/shunting314/257/head -> origin/gh/shunting314/257/head
2025-12-04T10:32:19.4070698Z  * [new branch]                gh/shunting314/257/orig -> origin/gh/shunting314/257/orig
2025-12-04T10:32:19.4070776Z  * [new branch]                gh/shunting314/258/base -> origin/gh/shunting314/258/base
2025-12-04T10:32:19.4070852Z  * [new branch]                gh/shunting314/258/head -> origin/gh/shunting314/258/head
2025-12-04T10:32:19.4070927Z  * [new branch]                gh/shunting314/258/orig -> origin/gh/shunting314/258/orig
2025-12-04T10:32:19.4071002Z  * [new branch]                gh/shunting314/259/base -> origin/gh/shunting314/259/base
2025-12-04T10:32:19.4071075Z  * [new branch]                gh/shunting314/259/head -> origin/gh/shunting314/259/head
2025-12-04T10:32:19.4071147Z  * [new branch]                gh/shunting314/259/orig -> origin/gh/shunting314/259/orig
2025-12-04T10:32:19.4071223Z  * [new branch]                gh/shunting314/260/base -> origin/gh/shunting314/260/base
2025-12-04T10:32:19.4071295Z  * [new branch]                gh/shunting314/260/head -> origin/gh/shunting314/260/head
2025-12-04T10:32:19.4071368Z  * [new branch]                gh/shunting314/260/orig -> origin/gh/shunting314/260/orig
2025-12-04T10:32:19.4071489Z  * [new branch]                gh/shunting314/261/base -> origin/gh/shunting314/261/base
2025-12-04T10:32:19.4071563Z  * [new branch]                gh/shunting314/261/head -> origin/gh/shunting314/261/head
2025-12-04T10:32:19.4071637Z  * [new branch]                gh/shunting314/261/orig -> origin/gh/shunting314/261/orig
2025-12-04T10:32:19.4071713Z  * [new branch]                gh/shunting314/262/base -> origin/gh/shunting314/262/base
2025-12-04T10:32:19.4071787Z  * [new branch]                gh/shunting314/262/head -> origin/gh/shunting314/262/head
2025-12-04T10:32:19.4071860Z  * [new branch]                gh/shunting314/262/orig -> origin/gh/shunting314/262/orig
2025-12-04T10:32:19.4071935Z  * [new branch]                gh/shunting314/263/base -> origin/gh/shunting314/263/base
2025-12-04T10:32:19.4072009Z  * [new branch]                gh/shunting314/263/head -> origin/gh/shunting314/263/head
2025-12-04T10:32:19.4072082Z  * [new branch]                gh/shunting314/263/orig -> origin/gh/shunting314/263/orig
2025-12-04T10:32:19.4072157Z  * [new branch]                gh/shunting314/264/base -> origin/gh/shunting314/264/base
2025-12-04T10:32:19.4072231Z  * [new branch]                gh/shunting314/264/head -> origin/gh/shunting314/264/head
2025-12-04T10:32:19.4072305Z  * [new branch]                gh/shunting314/264/orig -> origin/gh/shunting314/264/orig
2025-12-04T10:32:19.4072378Z  * [new branch]                gh/shunting314/265/base -> origin/gh/shunting314/265/base
2025-12-04T10:32:19.4072451Z  * [new branch]                gh/shunting314/265/head -> origin/gh/shunting314/265/head
2025-12-04T10:32:19.4072526Z  * [new branch]                gh/shunting314/265/orig -> origin/gh/shunting314/265/orig
2025-12-04T10:32:19.4072599Z  * [new branch]                gh/shunting314/266/base -> origin/gh/shunting314/266/base
2025-12-04T10:32:19.4072673Z  * [new branch]                gh/shunting314/266/head -> origin/gh/shunting314/266/head
2025-12-04T10:32:19.4072747Z  * [new branch]                gh/shunting314/266/orig -> origin/gh/shunting314/266/orig
2025-12-04T10:32:19.4072820Z  * [new branch]                gh/shunting314/267/base -> origin/gh/shunting314/267/base
2025-12-04T10:32:19.4072893Z  * [new branch]                gh/shunting314/267/head -> origin/gh/shunting314/267/head
2025-12-04T10:32:19.4072968Z  * [new branch]                gh/shunting314/267/orig -> origin/gh/shunting314/267/orig
2025-12-04T10:32:19.4073041Z  * [new branch]                gh/shunting314/268/base -> origin/gh/shunting314/268/base
2025-12-04T10:32:19.4073114Z  * [new branch]                gh/shunting314/268/head -> origin/gh/shunting314/268/head
2025-12-04T10:32:19.4073225Z  * [new branch]                gh/shunting314/268/orig -> origin/gh/shunting314/268/orig
2025-12-04T10:32:19.4073299Z  * [new branch]                gh/shunting314/269/base -> origin/gh/shunting314/269/base
2025-12-04T10:32:19.4073372Z  * [new branch]                gh/shunting314/269/head -> origin/gh/shunting314/269/head
2025-12-04T10:32:19.4073448Z  * [new branch]                gh/shunting314/269/orig -> origin/gh/shunting314/269/orig
2025-12-04T10:32:19.4073522Z  * [new branch]                gh/silverguo/1/base     -> origin/gh/silverguo/1/base
2025-12-04T10:32:19.4073598Z  * [new branch]                gh/silverguo/1/head     -> origin/gh/silverguo/1/head
2025-12-04T10:32:19.4073669Z  * [new branch]                gh/silverguo/2/base     -> origin/gh/silverguo/2/base
2025-12-04T10:32:19.4073740Z  * [new branch]                gh/silverguo/2/head     -> origin/gh/silverguo/2/head
2025-12-04T10:32:19.4073812Z  * [new branch]                gh/silverguo/3/base     -> origin/gh/silverguo/3/base
2025-12-04T10:32:19.4073883Z  * [new branch]                gh/silverguo/3/head     -> origin/gh/silverguo/3/head
2025-12-04T10:32:19.4073953Z  * [new branch]                gh/silverguo/4/base     -> origin/gh/silverguo/4/base
2025-12-04T10:32:19.4074023Z  * [new branch]                gh/silverguo/4/head     -> origin/gh/silverguo/4/head
2025-12-04T10:32:19.4074128Z  * [new branch]                gh/slayton58/39/base    -> origin/gh/slayton58/39/base
2025-12-04T10:32:19.4074199Z  * [new branch]                gh/slayton58/39/head    -> origin/gh/slayton58/39/head
2025-12-04T10:32:19.4074270Z  * [new branch]                gh/slayton58/39/orig    -> origin/gh/slayton58/39/orig
2025-12-04T10:32:19.4074339Z  * [new branch]                gh/slayton58/42/base    -> origin/gh/slayton58/42/base
2025-12-04T10:32:19.4074408Z  * [new branch]                gh/slayton58/42/head    -> origin/gh/slayton58/42/head
2025-12-04T10:32:19.4074477Z  * [new branch]                gh/slayton58/42/orig    -> origin/gh/slayton58/42/orig
2025-12-04T10:32:19.4074547Z  * [new branch]                gh/slayton58/43/base    -> origin/gh/slayton58/43/base
2025-12-04T10:32:19.4074617Z  * [new branch]                gh/slayton58/43/head    -> origin/gh/slayton58/43/head
2025-12-04T10:32:19.4074687Z  * [new branch]                gh/slayton58/43/orig    -> origin/gh/slayton58/43/orig
2025-12-04T10:32:19.4074757Z  * [new branch]                gh/slayton58/44/base    -> origin/gh/slayton58/44/base
2025-12-04T10:32:19.4074828Z  * [new branch]                gh/slayton58/44/head    -> origin/gh/slayton58/44/head
2025-12-04T10:32:19.4074900Z  * [new branch]                gh/slayton58/44/orig    -> origin/gh/slayton58/44/orig
2025-12-04T10:32:19.4074970Z  * [new branch]                gh/slayton58/45/base    -> origin/gh/slayton58/45/base
2025-12-04T10:32:19.4075040Z  * [new branch]                gh/slayton58/45/head    -> origin/gh/slayton58/45/head
2025-12-04T10:32:19.4075112Z  * [new branch]                gh/slayton58/45/orig    -> origin/gh/slayton58/45/orig
2025-12-04T10:32:19.4075183Z  * [new branch]                gh/slayton58/46/base    -> origin/gh/slayton58/46/base
2025-12-04T10:32:19.4075255Z  * [new branch]                gh/slayton58/46/head    -> origin/gh/slayton58/46/head
2025-12-04T10:32:19.4075325Z  * [new branch]                gh/slayton58/46/orig    -> origin/gh/slayton58/46/orig
2025-12-04T10:32:19.4075397Z  * [new branch]                gh/slayton58/6/base     -> origin/gh/slayton58/6/base
2025-12-04T10:32:19.4075468Z  * [new branch]                gh/slayton58/6/head     -> origin/gh/slayton58/6/head
2025-12-04T10:32:19.4075538Z  * [new branch]                gh/slayton58/7/base     -> origin/gh/slayton58/7/base
2025-12-04T10:32:19.4075606Z  * [new branch]                gh/slayton58/7/head     -> origin/gh/slayton58/7/head
2025-12-04T10:32:19.4075680Z  * [new branch]                gh/soulitzer/269/base   -> origin/gh/soulitzer/269/base
2025-12-04T10:32:19.4075753Z  * [new branch]                gh/soulitzer/269/head   -> origin/gh/soulitzer/269/head
2025-12-04T10:32:19.4075856Z  * [new branch]                gh/soulitzer/269/orig   -> origin/gh/soulitzer/269/orig
2025-12-04T10:32:19.4075933Z  * [new branch]                gh/soulitzer/276/base   -> origin/gh/soulitzer/276/base
2025-12-04T10:32:19.4076004Z  * [new branch]                gh/soulitzer/276/head   -> origin/gh/soulitzer/276/head
2025-12-04T10:32:19.4076078Z  * [new branch]                gh/soulitzer/276/orig   -> origin/gh/soulitzer/276/orig
2025-12-04T10:32:19.4076152Z  * [new branch]                gh/soulitzer/287/base   -> origin/gh/soulitzer/287/base
2025-12-04T10:32:19.4076224Z  * [new branch]                gh/soulitzer/287/head   -> origin/gh/soulitzer/287/head
2025-12-04T10:32:19.4076297Z  * [new branch]                gh/soulitzer/287/orig   -> origin/gh/soulitzer/287/orig
2025-12-04T10:32:19.4076374Z  * [new branch]                gh/soulitzer/296/base   -> origin/gh/soulitzer/296/base
2025-12-04T10:32:19.4076449Z  * [new branch]                gh/soulitzer/296/head   -> origin/gh/soulitzer/296/head
2025-12-04T10:32:19.4076522Z  * [new branch]                gh/soulitzer/296/orig   -> origin/gh/soulitzer/296/orig
2025-12-04T10:32:19.4076597Z  * [new branch]                gh/soulitzer/299/base   -> origin/gh/soulitzer/299/base
2025-12-04T10:32:19.4076669Z  * [new branch]                gh/soulitzer/299/head   -> origin/gh/soulitzer/299/head
2025-12-04T10:32:19.4076767Z  * [new branch]                gh/soulitzer/299/orig   -> origin/gh/soulitzer/299/orig
2025-12-04T10:32:19.4076838Z  * [new branch]                gh/soulitzer/300/base   -> origin/gh/soulitzer/300/base
2025-12-04T10:32:19.4076910Z  * [new branch]                gh/soulitzer/300/head   -> origin/gh/soulitzer/300/head
2025-12-04T10:32:19.4076983Z  * [new branch]                gh/soulitzer/300/orig   -> origin/gh/soulitzer/300/orig
2025-12-04T10:32:19.4077054Z  * [new branch]                gh/soulitzer/301/base   -> origin/gh/soulitzer/301/base
2025-12-04T10:32:19.4077128Z  * [new branch]                gh/soulitzer/301/head   -> origin/gh/soulitzer/301/head
2025-12-04T10:32:19.4077203Z  * [new branch]                gh/soulitzer/301/orig   -> origin/gh/soulitzer/301/orig
2025-12-04T10:32:19.4077275Z  * [new branch]                gh/soulitzer/313/base   -> origin/gh/soulitzer/313/base
2025-12-04T10:32:19.4077346Z  * [new branch]                gh/soulitzer/313/head   -> origin/gh/soulitzer/313/head
2025-12-04T10:32:19.4077420Z  * [new branch]                gh/soulitzer/313/orig   -> origin/gh/soulitzer/313/orig
2025-12-04T10:32:19.4077493Z  * [new branch]                gh/soulitzer/319/base   -> origin/gh/soulitzer/319/base
2025-12-04T10:32:19.4077564Z  * [new branch]                gh/soulitzer/319/head   -> origin/gh/soulitzer/319/head
2025-12-04T10:32:19.4077637Z  * [new branch]                gh/soulitzer/319/orig   -> origin/gh/soulitzer/319/orig
2025-12-04T10:32:19.4077709Z  * [new branch]                gh/soulitzer/320/base   -> origin/gh/soulitzer/320/base
2025-12-04T10:32:19.4077782Z  * [new branch]                gh/soulitzer/320/head   -> origin/gh/soulitzer/320/head
2025-12-04T10:32:19.4077855Z  * [new branch]                gh/soulitzer/320/orig   -> origin/gh/soulitzer/320/orig
2025-12-04T10:32:19.4077926Z  * [new branch]                gh/soulitzer/336/base   -> origin/gh/soulitzer/336/base
2025-12-04T10:32:19.4077997Z  * [new branch]                gh/soulitzer/336/head   -> origin/gh/soulitzer/336/head
2025-12-04T10:32:19.4078070Z  * [new branch]                gh/soulitzer/336/orig   -> origin/gh/soulitzer/336/orig
2025-12-04T10:32:19.4078142Z  * [new branch]                gh/soulitzer/347/base   -> origin/gh/soulitzer/347/base
2025-12-04T10:32:19.4078214Z  * [new branch]                gh/soulitzer/347/head   -> origin/gh/soulitzer/347/head
2025-12-04T10:32:19.4078289Z  * [new branch]                gh/soulitzer/347/orig   -> origin/gh/soulitzer/347/orig
2025-12-04T10:32:19.4078361Z  * [new branch]                gh/soulitzer/349/base   -> origin/gh/soulitzer/349/base
2025-12-04T10:32:19.4078459Z  * [new branch]                gh/soulitzer/349/head   -> origin/gh/soulitzer/349/head
2025-12-04T10:32:19.4078529Z  * [new branch]                gh/soulitzer/349/orig   -> origin/gh/soulitzer/349/orig
2025-12-04T10:32:19.4078600Z  * [new branch]                gh/soulitzer/350/base   -> origin/gh/soulitzer/350/base
2025-12-04T10:32:19.4078676Z  * [new branch]                gh/soulitzer/350/head   -> origin/gh/soulitzer/350/head
2025-12-04T10:32:19.4078747Z  * [new branch]                gh/soulitzer/350/orig   -> origin/gh/soulitzer/350/orig
2025-12-04T10:32:19.4078818Z  * [new branch]                gh/soulitzer/351/base   -> origin/gh/soulitzer/351/base
2025-12-04T10:32:19.4078890Z  * [new branch]                gh/soulitzer/351/head   -> origin/gh/soulitzer/351/head
2025-12-04T10:32:19.4078962Z  * [new branch]                gh/soulitzer/351/orig   -> origin/gh/soulitzer/351/orig
2025-12-04T10:32:19.4079034Z  * [new branch]                gh/soulitzer/353/base   -> origin/gh/soulitzer/353/base
2025-12-04T10:32:19.4079108Z  * [new branch]                gh/soulitzer/353/head   -> origin/gh/soulitzer/353/head
2025-12-04T10:32:19.4079180Z  * [new branch]                gh/soulitzer/353/orig   -> origin/gh/soulitzer/353/orig
2025-12-04T10:32:19.4079253Z  * [new branch]                gh/soulitzer/358/base   -> origin/gh/soulitzer/358/base
2025-12-04T10:32:19.4079352Z  * [new branch]                gh/soulitzer/358/head   -> origin/gh/soulitzer/358/head
2025-12-04T10:32:19.4079424Z  * [new branch]                gh/soulitzer/358/orig   -> origin/gh/soulitzer/358/orig
2025-12-04T10:32:19.4079497Z  * [new branch]                gh/soulitzer/359/base   -> origin/gh/soulitzer/359/base
2025-12-04T10:32:19.4079609Z  * [new branch]                gh/soulitzer/359/head   -> origin/gh/soulitzer/359/head
2025-12-04T10:32:19.4079681Z  * [new branch]                gh/soulitzer/359/orig   -> origin/gh/soulitzer/359/orig
2025-12-04T10:32:19.4079753Z  * [new branch]                gh/soulitzer/374/base   -> origin/gh/soulitzer/374/base
2025-12-04T10:32:19.4079826Z  * [new branch]                gh/soulitzer/374/head   -> origin/gh/soulitzer/374/head
2025-12-04T10:32:19.4079897Z  * [new branch]                gh/soulitzer/374/orig   -> origin/gh/soulitzer/374/orig
2025-12-04T10:32:19.4079969Z  * [new branch]                gh/soulitzer/375/base   -> origin/gh/soulitzer/375/base
2025-12-04T10:32:19.4080041Z  * [new branch]                gh/soulitzer/375/head   -> origin/gh/soulitzer/375/head
2025-12-04T10:32:19.4080113Z  * [new branch]                gh/soulitzer/375/orig   -> origin/gh/soulitzer/375/orig
2025-12-04T10:32:19.4080186Z  * [new branch]                gh/soulitzer/380/base   -> origin/gh/soulitzer/380/base
2025-12-04T10:32:19.4080256Z  * [new branch]                gh/soulitzer/380/head   -> origin/gh/soulitzer/380/head
2025-12-04T10:32:19.4080330Z  * [new branch]                gh/soulitzer/380/orig   -> origin/gh/soulitzer/380/orig
2025-12-04T10:32:19.4080402Z  * [new branch]                gh/soulitzer/385/base   -> origin/gh/soulitzer/385/base
2025-12-04T10:32:19.4080477Z  * [new branch]                gh/soulitzer/385/head   -> origin/gh/soulitzer/385/head
2025-12-04T10:32:19.4080549Z  * [new branch]                gh/soulitzer/385/orig   -> origin/gh/soulitzer/385/orig
2025-12-04T10:32:19.4080621Z  * [new branch]                gh/soulitzer/386/base   -> origin/gh/soulitzer/386/base
2025-12-04T10:32:19.4080696Z  * [new branch]                gh/soulitzer/386/head   -> origin/gh/soulitzer/386/head
2025-12-04T10:32:19.4080770Z  * [new branch]                gh/soulitzer/386/orig   -> origin/gh/soulitzer/386/orig
2025-12-04T10:32:19.4080842Z  * [new branch]                gh/soulitzer/387/base   -> origin/gh/soulitzer/387/base
2025-12-04T10:32:19.4080913Z  * [new branch]                gh/soulitzer/387/head   -> origin/gh/soulitzer/387/head
2025-12-04T10:32:19.4080985Z  * [new branch]                gh/soulitzer/387/orig   -> origin/gh/soulitzer/387/orig
2025-12-04T10:32:19.4081109Z  * [new branch]                gh/soulitzer/388/base   -> origin/gh/soulitzer/388/base
2025-12-04T10:32:19.4081180Z  * [new branch]                gh/soulitzer/388/head   -> origin/gh/soulitzer/388/head
2025-12-04T10:32:19.4081255Z  * [new branch]                gh/soulitzer/388/orig   -> origin/gh/soulitzer/388/orig
2025-12-04T10:32:19.4081328Z  * [new branch]                gh/soulitzer/389/base   -> origin/gh/soulitzer/389/base
2025-12-04T10:32:19.4081401Z  * [new branch]                gh/soulitzer/389/head   -> origin/gh/soulitzer/389/head
2025-12-04T10:32:19.4081476Z  * [new branch]                gh/soulitzer/389/orig   -> origin/gh/soulitzer/389/orig
2025-12-04T10:32:19.4081547Z  * [new branch]                gh/soulitzer/390/base   -> origin/gh/soulitzer/390/base
2025-12-04T10:32:19.4081618Z  * [new branch]                gh/soulitzer/390/head   -> origin/gh/soulitzer/390/head
2025-12-04T10:32:19.4081691Z  * [new branch]                gh/soulitzer/390/orig   -> origin/gh/soulitzer/390/orig
2025-12-04T10:32:19.4081764Z  * [new branch]                gh/soulitzer/391/base   -> origin/gh/soulitzer/391/base
2025-12-04T10:32:19.4081837Z  * [new branch]                gh/soulitzer/391/head   -> origin/gh/soulitzer/391/head
2025-12-04T10:32:19.4081908Z  * [new branch]                gh/soulitzer/391/orig   -> origin/gh/soulitzer/391/orig
2025-12-04T10:32:19.4081978Z  * [new branch]                gh/soulitzer/392/base   -> origin/gh/soulitzer/392/base
2025-12-04T10:32:19.4082121Z  * [new branch]                gh/soulitzer/392/head   -> origin/gh/soulitzer/392/head
2025-12-04T10:32:19.4082193Z  * [new branch]                gh/soulitzer/392/orig   -> origin/gh/soulitzer/392/orig
2025-12-04T10:32:19.4082264Z  * [new branch]                gh/swolchok/728/next    -> origin/gh/swolchok/728/next
2025-12-04T10:32:19.4082337Z  * [new branch]                gh/swolchok/819/base    -> origin/gh/swolchok/819/base
2025-12-04T10:32:19.4082407Z  * [new branch]                gh/swolchok/819/head    -> origin/gh/swolchok/819/head
2025-12-04T10:32:19.4082479Z  * [new branch]                gh/swolchok/819/orig    -> origin/gh/swolchok/819/orig
2025-12-04T10:32:19.4082549Z  * [new branch]                gh/swolchok/824/base    -> origin/gh/swolchok/824/base
2025-12-04T10:32:19.4082620Z  * [new branch]                gh/swolchok/824/head    -> origin/gh/swolchok/824/head
2025-12-04T10:32:19.4082690Z  * [new branch]                gh/swolchok/824/orig    -> origin/gh/swolchok/824/orig
2025-12-04T10:32:19.4082762Z  * [new branch]                gh/swolchok/829/base    -> origin/gh/swolchok/829/base
2025-12-04T10:32:19.4082833Z  * [new branch]                gh/swolchok/829/head    -> origin/gh/swolchok/829/head
2025-12-04T10:32:19.4082904Z  * [new branch]                gh/swolchok/829/orig    -> origin/gh/swolchok/829/orig
2025-12-04T10:32:19.4082974Z  * [new branch]                gh/swolchok/839/base    -> origin/gh/swolchok/839/base
2025-12-04T10:32:19.4083043Z  * [new branch]                gh/swolchok/839/head    -> origin/gh/swolchok/839/head
2025-12-04T10:32:19.4083116Z  * [new branch]                gh/swolchok/839/orig    -> origin/gh/swolchok/839/orig
2025-12-04T10:32:19.4083186Z  * [new branch]                gh/swolchok/841/base    -> origin/gh/swolchok/841/base
2025-12-04T10:32:19.4083255Z  * [new branch]                gh/swolchok/841/head    -> origin/gh/swolchok/841/head
2025-12-04T10:32:19.4083326Z  * [new branch]                gh/swolchok/841/orig    -> origin/gh/swolchok/841/orig
2025-12-04T10:32:19.4083399Z  * [new branch]                gh/swolchok/842/base    -> origin/gh/swolchok/842/base
2025-12-04T10:32:19.4083468Z  * [new branch]                gh/swolchok/842/head    -> origin/gh/swolchok/842/head
2025-12-04T10:32:19.4083538Z  * [new branch]                gh/swolchok/842/orig    -> origin/gh/swolchok/842/orig
2025-12-04T10:32:19.4083607Z  * [new branch]                gh/swolchok/845/base    -> origin/gh/swolchok/845/base
2025-12-04T10:32:19.4083678Z  * [new branch]                gh/swolchok/845/head    -> origin/gh/swolchok/845/head
2025-12-04T10:32:19.4083776Z  * [new branch]                gh/swolchok/845/orig    -> origin/gh/swolchok/845/orig
2025-12-04T10:32:19.4083846Z  * [new branch]                gh/swolchok/848/base    -> origin/gh/swolchok/848/base
2025-12-04T10:32:19.4083915Z  * [new branch]                gh/swolchok/848/head    -> origin/gh/swolchok/848/head
2025-12-04T10:32:19.4083988Z  * [new branch]                gh/swolchok/848/orig    -> origin/gh/swolchok/848/orig
2025-12-04T10:32:19.4084058Z  * [new branch]                gh/swolchok/856/base    -> origin/gh/swolchok/856/base
2025-12-04T10:32:19.4084128Z  * [new branch]                gh/swolchok/856/head    -> origin/gh/swolchok/856/head
2025-12-04T10:32:19.4084200Z  * [new branch]                gh/swolchok/856/orig    -> origin/gh/swolchok/856/orig
2025-12-04T10:32:19.4084270Z  * [new branch]                gh/swolchok/860/base    -> origin/gh/swolchok/860/base
2025-12-04T10:32:19.4084341Z  * [new branch]                gh/swolchok/860/head    -> origin/gh/swolchok/860/head
2025-12-04T10:32:19.4084414Z  * [new branch]                gh/swolchok/860/orig    -> origin/gh/swolchok/860/orig
2025-12-04T10:32:19.4084484Z  * [new branch]                gh/swolchok/861/base    -> origin/gh/swolchok/861/base
2025-12-04T10:32:19.4084553Z  * [new branch]                gh/swolchok/861/head    -> origin/gh/swolchok/861/head
2025-12-04T10:32:19.4084658Z  * [new branch]                gh/swolchok/861/orig    -> origin/gh/swolchok/861/orig
2025-12-04T10:32:19.4084728Z  * [new branch]                gh/swolchok/862/base    -> origin/gh/swolchok/862/base
2025-12-04T10:32:19.4084797Z  * [new branch]                gh/swolchok/862/head    -> origin/gh/swolchok/862/head
2025-12-04T10:32:19.4084867Z  * [new branch]                gh/swolchok/862/orig    -> origin/gh/swolchok/862/orig
2025-12-04T10:32:19.4084938Z  * [new branch]                gh/swolchok/863/base    -> origin/gh/swolchok/863/base
2025-12-04T10:32:19.4085009Z  * [new branch]                gh/swolchok/863/head    -> origin/gh/swolchok/863/head
2025-12-04T10:32:19.4085081Z  * [new branch]                gh/swolchok/863/orig    -> origin/gh/swolchok/863/orig
2025-12-04T10:32:19.4085150Z  * [new branch]                gh/swolchok/864/base    -> origin/gh/swolchok/864/base
2025-12-04T10:32:19.4085221Z  * [new branch]                gh/swolchok/864/head    -> origin/gh/swolchok/864/head
2025-12-04T10:32:19.4085291Z  * [new branch]                gh/swolchok/864/orig    -> origin/gh/swolchok/864/orig
2025-12-04T10:32:19.4085361Z  * [new branch]                gh/swolchok/865/base    -> origin/gh/swolchok/865/base
2025-12-04T10:32:19.4085432Z  * [new branch]                gh/swolchok/865/head    -> origin/gh/swolchok/865/head
2025-12-04T10:32:19.4085501Z  * [new branch]                gh/swolchok/865/orig    -> origin/gh/swolchok/865/orig
2025-12-04T10:32:19.4085570Z  * [new branch]                gh/swolchok/866/base    -> origin/gh/swolchok/866/base
2025-12-04T10:32:19.4085640Z  * [new branch]                gh/swolchok/866/head    -> origin/gh/swolchok/866/head
2025-12-04T10:32:19.4085710Z  * [new branch]                gh/swolchok/866/orig    -> origin/gh/swolchok/866/orig
2025-12-04T10:32:19.4085779Z  * [new branch]                gh/swolchok/867/base    -> origin/gh/swolchok/867/base
2025-12-04T10:32:19.4085848Z  * [new branch]                gh/swolchok/867/head    -> origin/gh/swolchok/867/head
2025-12-04T10:32:19.4085919Z  * [new branch]                gh/swolchok/867/orig    -> origin/gh/swolchok/867/orig
2025-12-04T10:32:19.4085988Z  * [new branch]                gh/swolchok/868/base    -> origin/gh/swolchok/868/base
2025-12-04T10:32:19.4086059Z  * [new branch]                gh/swolchok/868/head    -> origin/gh/swolchok/868/head
2025-12-04T10:32:19.4086128Z  * [new branch]                gh/swolchok/868/orig    -> origin/gh/swolchok/868/orig
2025-12-04T10:32:19.4086197Z  * [new branch]                gh/swolchok/869/base    -> origin/gh/swolchok/869/base
2025-12-04T10:32:19.4086294Z  * [new branch]                gh/swolchok/869/head    -> origin/gh/swolchok/869/head
2025-12-04T10:32:19.4086363Z  * [new branch]                gh/swolchok/869/orig    -> origin/gh/swolchok/869/orig
2025-12-04T10:32:19.4086436Z  * [new branch]                gh/swolchok/870/base    -> origin/gh/swolchok/870/base
2025-12-04T10:32:19.4086504Z  * [new branch]                gh/swolchok/870/head    -> origin/gh/swolchok/870/head
2025-12-04T10:32:19.4086576Z  * [new branch]                gh/swolchok/870/orig    -> origin/gh/swolchok/870/orig
2025-12-04T10:32:19.4086647Z  * [new branch]                gh/swolchok/871/base    -> origin/gh/swolchok/871/base
2025-12-04T10:32:19.4086716Z  * [new branch]                gh/swolchok/871/head    -> origin/gh/swolchok/871/head
2025-12-04T10:32:19.4086785Z  * [new branch]                gh/swolchok/871/orig    -> origin/gh/swolchok/871/orig
2025-12-04T10:32:19.4086856Z  * [new branch]                gh/teja-rao/4/base      -> origin/gh/teja-rao/4/base
2025-12-04T10:32:19.4086926Z  * [new branch]                gh/teja-rao/4/head      -> origin/gh/teja-rao/4/head
2025-12-04T10:32:19.4086994Z  * [new branch]                gh/teja-rao/4/orig      -> origin/gh/teja-rao/4/orig
2025-12-04T10:32:19.4087064Z  * [new branch]                gh/tianyu-l/2/base      -> origin/gh/tianyu-l/2/base
2025-12-04T10:32:19.4087131Z  * [new branch]                gh/tianyu-l/2/head      -> origin/gh/tianyu-l/2/head
2025-12-04T10:32:19.4087224Z  * [new branch]                gh/tianyu-l/2/orig      -> origin/gh/tianyu-l/2/orig
2025-12-04T10:32:19.4087293Z  * [new branch]                gh/tianyu-l/3/base      -> origin/gh/tianyu-l/3/base
2025-12-04T10:32:19.4087360Z  * [new branch]                gh/tianyu-l/3/orig      -> origin/gh/tianyu-l/3/orig
2025-12-04T10:32:19.4087428Z  * [new branch]                gh/tianyu-l/4/base      -> origin/gh/tianyu-l/4/base
2025-12-04T10:32:19.4087496Z  * [new branch]                gh/tianyu-l/4/head      -> origin/gh/tianyu-l/4/head
2025-12-04T10:32:19.4087564Z  * [new branch]                gh/tianyu-l/4/orig      -> origin/gh/tianyu-l/4/orig
2025-12-04T10:32:19.4087654Z  * [new branch]                gh/tugsbayasgalan/10/base -> origin/gh/tugsbayasgalan/10/base
2025-12-04T10:32:19.4087739Z  * [new branch]                gh/tugsbayasgalan/10/head -> origin/gh/tugsbayasgalan/10/head
2025-12-04T10:32:19.4087822Z  * [new branch]                gh/tugsbayasgalan/10/orig -> origin/gh/tugsbayasgalan/10/orig
2025-12-04T10:32:19.4087906Z  * [new branch]                gh/tugsbayasgalan/13/base -> origin/gh/tugsbayasgalan/13/base
2025-12-04T10:32:19.4087991Z  * [new branch]                gh/tugsbayasgalan/13/head -> origin/gh/tugsbayasgalan/13/head
2025-12-04T10:32:19.4088071Z  * [new branch]                gh/tugsbayasgalan/13/orig -> origin/gh/tugsbayasgalan/13/orig
2025-12-04T10:32:19.4088154Z  * [new branch]                gh/tugsbayasgalan/17/base -> origin/gh/tugsbayasgalan/17/base
2025-12-04T10:32:19.4088236Z  * [new branch]                gh/tugsbayasgalan/17/head -> origin/gh/tugsbayasgalan/17/head
2025-12-04T10:32:19.4088318Z  * [new branch]                gh/tugsbayasgalan/17/orig -> origin/gh/tugsbayasgalan/17/orig
2025-12-04T10:32:19.4088401Z  * [new branch]                gh/tugsbayasgalan/2/base -> origin/gh/tugsbayasgalan/2/base
2025-12-04T10:32:19.4088483Z  * [new branch]                gh/tugsbayasgalan/2/head -> origin/gh/tugsbayasgalan/2/head
2025-12-04T10:32:19.4088562Z  * [new branch]                gh/tugsbayasgalan/2/orig -> origin/gh/tugsbayasgalan/2/orig
2025-12-04T10:32:19.4088645Z  * [new branch]                gh/tugsbayasgalan/28/base -> origin/gh/tugsbayasgalan/28/base
2025-12-04T10:32:19.4088726Z  * [new branch]                gh/tugsbayasgalan/28/head -> origin/gh/tugsbayasgalan/28/head
2025-12-04T10:32:19.4088808Z  * [new branch]                gh/tugsbayasgalan/28/orig -> origin/gh/tugsbayasgalan/28/orig
2025-12-04T10:32:19.4088893Z  * [new branch]                gh/tugsbayasgalan/32/base -> origin/gh/tugsbayasgalan/32/base
2025-12-04T10:32:19.4089003Z  * [new branch]                gh/tugsbayasgalan/32/head -> origin/gh/tugsbayasgalan/32/head
2025-12-04T10:32:19.4089084Z  * [new branch]                gh/tugsbayasgalan/32/orig -> origin/gh/tugsbayasgalan/32/orig
2025-12-04T10:32:19.4089167Z  * [new branch]                gh/tugsbayasgalan/35/base -> origin/gh/tugsbayasgalan/35/base
2025-12-04T10:32:19.4089248Z  * [new branch]                gh/tugsbayasgalan/35/head -> origin/gh/tugsbayasgalan/35/head
2025-12-04T10:32:19.4089328Z  * [new branch]                gh/tugsbayasgalan/35/orig -> origin/gh/tugsbayasgalan/35/orig
2025-12-04T10:32:19.4089413Z  * [new branch]                gh/tugsbayasgalan/36/base -> origin/gh/tugsbayasgalan/36/base
2025-12-04T10:32:19.4089494Z  * [new branch]                gh/tugsbayasgalan/36/head -> origin/gh/tugsbayasgalan/36/head
2025-12-04T10:32:19.4089609Z  * [new branch]                gh/tugsbayasgalan/36/orig -> origin/gh/tugsbayasgalan/36/orig
2025-12-04T10:32:19.4089691Z  * [new branch]                gh/tugsbayasgalan/37/base -> origin/gh/tugsbayasgalan/37/base
2025-12-04T10:32:19.4089774Z  * [new branch]                gh/tugsbayasgalan/37/head -> origin/gh/tugsbayasgalan/37/head
2025-12-04T10:32:19.4089857Z  * [new branch]                gh/tugsbayasgalan/37/orig -> origin/gh/tugsbayasgalan/37/orig
2025-12-04T10:32:19.4089936Z  * [new branch]                gh/tugsbayasgalan/43/base -> origin/gh/tugsbayasgalan/43/base
2025-12-04T10:32:19.4090068Z  * [new branch]                gh/tugsbayasgalan/43/head -> origin/gh/tugsbayasgalan/43/head
2025-12-04T10:32:19.4090151Z  * [new branch]                gh/tugsbayasgalan/43/orig -> origin/gh/tugsbayasgalan/43/orig
2025-12-04T10:32:19.4090231Z  * [new branch]                gh/tugsbayasgalan/48/base -> origin/gh/tugsbayasgalan/48/base
2025-12-04T10:32:19.4090312Z  * [new branch]                gh/tugsbayasgalan/48/head -> origin/gh/tugsbayasgalan/48/head
2025-12-04T10:32:19.4090393Z  * [new branch]                gh/tugsbayasgalan/48/orig -> origin/gh/tugsbayasgalan/48/orig
2025-12-04T10:32:19.4090476Z  * [new branch]                gh/tugsbayasgalan/51/base -> origin/gh/tugsbayasgalan/51/base
2025-12-04T10:32:19.4090556Z  * [new branch]                gh/tugsbayasgalan/51/head -> origin/gh/tugsbayasgalan/51/head
2025-12-04T10:32:19.4090641Z  * [new branch]                gh/tugsbayasgalan/51/orig -> origin/gh/tugsbayasgalan/51/orig
2025-12-04T10:32:19.4090724Z  * [new branch]                gh/tugsbayasgalan/52/base -> origin/gh/tugsbayasgalan/52/base
2025-12-04T10:32:19.4090805Z  * [new branch]                gh/tugsbayasgalan/52/head -> origin/gh/tugsbayasgalan/52/head
2025-12-04T10:32:19.4090886Z  * [new branch]                gh/tugsbayasgalan/52/orig -> origin/gh/tugsbayasgalan/52/orig
2025-12-04T10:32:19.4090967Z  * [new branch]                gh/tugsbayasgalan/53/base -> origin/gh/tugsbayasgalan/53/base
2025-12-04T10:32:19.4091047Z  * [new branch]                gh/tugsbayasgalan/53/head -> origin/gh/tugsbayasgalan/53/head
2025-12-04T10:32:19.4091130Z  * [new branch]                gh/tugsbayasgalan/53/orig -> origin/gh/tugsbayasgalan/53/orig
2025-12-04T10:32:19.4091215Z  * [new branch]                gh/tugsbayasgalan/55/base -> origin/gh/tugsbayasgalan/55/base
2025-12-04T10:32:19.4091299Z  * [new branch]                gh/tugsbayasgalan/55/head -> origin/gh/tugsbayasgalan/55/head
2025-12-04T10:32:19.4091381Z  * [new branch]                gh/tugsbayasgalan/55/orig -> origin/gh/tugsbayasgalan/55/orig
2025-12-04T10:32:19.4091463Z  * [new branch]                gh/tugsbayasgalan/59/base -> origin/gh/tugsbayasgalan/59/base
2025-12-04T10:32:19.4091546Z  * [new branch]                gh/tugsbayasgalan/59/head -> origin/gh/tugsbayasgalan/59/head
2025-12-04T10:32:19.4091629Z  * [new branch]                gh/tugsbayasgalan/59/orig -> origin/gh/tugsbayasgalan/59/orig
2025-12-04T10:32:19.4091710Z  * [new branch]                gh/tugsbayasgalan/6/base -> origin/gh/tugsbayasgalan/6/base
2025-12-04T10:32:19.4091791Z  * [new branch]                gh/tugsbayasgalan/6/head -> origin/gh/tugsbayasgalan/6/head
2025-12-04T10:32:19.4091922Z  * [new branch]                gh/tugsbayasgalan/6/orig -> origin/gh/tugsbayasgalan/6/orig
2025-12-04T10:32:19.4092004Z  * [new branch]                gh/tugsbayasgalan/60/base -> origin/gh/tugsbayasgalan/60/base
2025-12-04T10:32:19.4092085Z  * [new branch]                gh/tugsbayasgalan/60/head -> origin/gh/tugsbayasgalan/60/head
2025-12-04T10:32:19.4092167Z  * [new branch]                gh/tugsbayasgalan/60/orig -> origin/gh/tugsbayasgalan/60/orig
2025-12-04T10:32:19.4092248Z  * [new branch]                gh/tugsbayasgalan/61/base -> origin/gh/tugsbayasgalan/61/base
2025-12-04T10:32:19.4092332Z  * [new branch]                gh/tugsbayasgalan/61/head -> origin/gh/tugsbayasgalan/61/head
2025-12-04T10:32:19.4092413Z  * [new branch]                gh/tugsbayasgalan/61/orig -> origin/gh/tugsbayasgalan/61/orig
2025-12-04T10:32:19.4092494Z  * [new branch]                gh/tugsbayasgalan/63/base -> origin/gh/tugsbayasgalan/63/base
2025-12-04T10:32:19.4092579Z  * [new branch]                gh/tugsbayasgalan/63/head -> origin/gh/tugsbayasgalan/63/head
2025-12-04T10:32:19.4092659Z  * [new branch]                gh/tugsbayasgalan/63/orig -> origin/gh/tugsbayasgalan/63/orig
2025-12-04T10:32:19.4092740Z  * [new branch]                gh/tugsbayasgalan/67/base -> origin/gh/tugsbayasgalan/67/base
2025-12-04T10:32:19.4092824Z  * [new branch]                gh/tugsbayasgalan/67/head -> origin/gh/tugsbayasgalan/67/head
2025-12-04T10:32:19.4092929Z  * [new branch]                gh/tugsbayasgalan/67/orig -> origin/gh/tugsbayasgalan/67/orig
2025-12-04T10:32:19.4093010Z  * [new branch]                gh/tugsbayasgalan/68/base -> origin/gh/tugsbayasgalan/68/base
2025-12-04T10:32:19.4093090Z  * [new branch]                gh/tugsbayasgalan/68/head -> origin/gh/tugsbayasgalan/68/head
2025-12-04T10:32:19.4093171Z  * [new branch]                gh/tugsbayasgalan/68/orig -> origin/gh/tugsbayasgalan/68/orig
2025-12-04T10:32:19.4093251Z  * [new branch]                gh/tugsbayasgalan/7/base -> origin/gh/tugsbayasgalan/7/base
2025-12-04T10:32:19.4093332Z  * [new branch]                gh/tugsbayasgalan/7/head -> origin/gh/tugsbayasgalan/7/head
2025-12-04T10:32:19.4093410Z  * [new branch]                gh/tugsbayasgalan/7/orig -> origin/gh/tugsbayasgalan/7/orig
2025-12-04T10:32:19.4093493Z  * [new branch]                gh/tugsbayasgalan/70/base -> origin/gh/tugsbayasgalan/70/base
2025-12-04T10:32:19.4093575Z  * [new branch]                gh/tugsbayasgalan/70/head -> origin/gh/tugsbayasgalan/70/head
2025-12-04T10:32:19.4093658Z  * [new branch]                gh/tugsbayasgalan/70/orig -> origin/gh/tugsbayasgalan/70/orig
2025-12-04T10:32:19.4093739Z  * [new branch]                gh/tugsbayasgalan/71/base -> origin/gh/tugsbayasgalan/71/base
2025-12-04T10:32:19.4093819Z  * [new branch]                gh/tugsbayasgalan/71/head -> origin/gh/tugsbayasgalan/71/head
2025-12-04T10:32:19.4093900Z  * [new branch]                gh/tugsbayasgalan/71/orig -> origin/gh/tugsbayasgalan/71/orig
2025-12-04T10:32:19.4093987Z  * [new branch]                gh/tugsbayasgalan/72/base -> origin/gh/tugsbayasgalan/72/base
2025-12-04T10:32:19.4094069Z  * [new branch]                gh/tugsbayasgalan/72/head -> origin/gh/tugsbayasgalan/72/head
2025-12-04T10:32:19.4094150Z  * [new branch]                gh/tugsbayasgalan/72/orig -> origin/gh/tugsbayasgalan/72/orig
2025-12-04T10:32:19.4094233Z  * [new branch]                gh/tugsbayasgalan/73/base -> origin/gh/tugsbayasgalan/73/base
2025-12-04T10:32:19.4094315Z  * [new branch]                gh/tugsbayasgalan/73/head -> origin/gh/tugsbayasgalan/73/head
2025-12-04T10:32:19.4094398Z  * [new branch]                gh/tugsbayasgalan/73/orig -> origin/gh/tugsbayasgalan/73/orig
2025-12-04T10:32:19.4094478Z  * [new branch]                gh/tugsbayasgalan/74/base -> origin/gh/tugsbayasgalan/74/base
2025-12-04T10:32:19.4094559Z  * [new branch]                gh/tugsbayasgalan/74/head -> origin/gh/tugsbayasgalan/74/head
2025-12-04T10:32:19.4094642Z  * [new branch]                gh/tugsbayasgalan/74/orig -> origin/gh/tugsbayasgalan/74/orig
2025-12-04T10:32:19.4094750Z  * [new branch]                gh/tugsbayasgalan/75/base -> origin/gh/tugsbayasgalan/75/base
2025-12-04T10:32:19.4094831Z  * [new branch]                gh/tugsbayasgalan/75/head -> origin/gh/tugsbayasgalan/75/head
2025-12-04T10:32:19.4094914Z  * [new branch]                gh/tugsbayasgalan/75/orig -> origin/gh/tugsbayasgalan/75/orig
2025-12-04T10:32:19.4094997Z  * [new branch]                gh/tugsbayasgalan/76/base -> origin/gh/tugsbayasgalan/76/base
2025-12-04T10:32:19.4095078Z  * [new branch]                gh/tugsbayasgalan/76/head -> origin/gh/tugsbayasgalan/76/head
2025-12-04T10:32:19.4095159Z  * [new branch]                gh/tugsbayasgalan/76/orig -> origin/gh/tugsbayasgalan/76/orig
2025-12-04T10:32:19.4095240Z  * [new branch]                gh/tugsbayasgalan/77/base -> origin/gh/tugsbayasgalan/77/base
2025-12-04T10:32:19.4095320Z  * [new branch]                gh/tugsbayasgalan/77/head -> origin/gh/tugsbayasgalan/77/head
2025-12-04T10:32:19.4095405Z  * [new branch]                gh/tugsbayasgalan/77/orig -> origin/gh/tugsbayasgalan/77/orig
2025-12-04T10:32:19.4095487Z  * [new branch]                gh/tugsbayasgalan/78/base -> origin/gh/tugsbayasgalan/78/base
2025-12-04T10:32:19.4095567Z  * [new branch]                gh/tugsbayasgalan/78/head -> origin/gh/tugsbayasgalan/78/head
2025-12-04T10:32:19.4095678Z  * [new branch]                gh/tugsbayasgalan/78/orig -> origin/gh/tugsbayasgalan/78/orig
2025-12-04T10:32:19.4095759Z  * [new branch]                gh/tugsbayasgalan/79/base -> origin/gh/tugsbayasgalan/79/base
2025-12-04T10:32:19.4095840Z  * [new branch]                gh/tugsbayasgalan/79/head -> origin/gh/tugsbayasgalan/79/head
2025-12-04T10:32:19.4095924Z  * [new branch]                gh/tugsbayasgalan/79/orig -> origin/gh/tugsbayasgalan/79/orig
2025-12-04T10:32:19.4096004Z  * [new branch]                gh/tugsbayasgalan/8/base -> origin/gh/tugsbayasgalan/8/base
2025-12-04T10:32:19.4096086Z  * [new branch]                gh/tugsbayasgalan/8/head -> origin/gh/tugsbayasgalan/8/head
2025-12-04T10:32:19.4096165Z  * [new branch]                gh/tugsbayasgalan/8/orig -> origin/gh/tugsbayasgalan/8/orig
2025-12-04T10:32:19.4096247Z  * [new branch]                gh/tugsbayasgalan/80/base -> origin/gh/tugsbayasgalan/80/base
2025-12-04T10:32:19.4096328Z  * [new branch]                gh/tugsbayasgalan/80/head -> origin/gh/tugsbayasgalan/80/head
2025-12-04T10:32:19.4096412Z  * [new branch]                gh/tugsbayasgalan/80/orig -> origin/gh/tugsbayasgalan/80/orig
2025-12-04T10:32:19.4096493Z  * [new branch]                gh/tugsbayasgalan/81/base -> origin/gh/tugsbayasgalan/81/base
2025-12-04T10:32:19.4096574Z  * [new branch]                gh/tugsbayasgalan/81/head -> origin/gh/tugsbayasgalan/81/head
2025-12-04T10:32:19.4096655Z  * [new branch]                gh/tugsbayasgalan/81/orig -> origin/gh/tugsbayasgalan/81/orig
2025-12-04T10:32:19.4096736Z  * [new branch]                gh/tugsbayasgalan/82/base -> origin/gh/tugsbayasgalan/82/base
2025-12-04T10:32:19.4096818Z  * [new branch]                gh/tugsbayasgalan/82/head -> origin/gh/tugsbayasgalan/82/head
2025-12-04T10:32:19.4096900Z  * [new branch]                gh/tugsbayasgalan/82/orig -> origin/gh/tugsbayasgalan/82/orig
2025-12-04T10:32:19.4096980Z  * [new branch]                gh/tugsbayasgalan/83/base -> origin/gh/tugsbayasgalan/83/base
2025-12-04T10:32:19.4097062Z  * [new branch]                gh/tugsbayasgalan/83/head -> origin/gh/tugsbayasgalan/83/head
2025-12-04T10:32:19.4097143Z  * [new branch]                gh/tugsbayasgalan/83/orig -> origin/gh/tugsbayasgalan/83/orig
2025-12-04T10:32:19.4097222Z  * [new branch]                gh/tugsbayasgalan/84/base -> origin/gh/tugsbayasgalan/84/base
2025-12-04T10:32:19.4097305Z  * [new branch]                gh/tugsbayasgalan/84/head -> origin/gh/tugsbayasgalan/84/head
2025-12-04T10:32:19.4097387Z  * [new branch]                gh/tugsbayasgalan/84/orig -> origin/gh/tugsbayasgalan/84/orig
2025-12-04T10:32:19.4097496Z  * [new branch]                gh/tugsbayasgalan/85/base -> origin/gh/tugsbayasgalan/85/base
2025-12-04T10:32:19.4097578Z  * [new branch]                gh/tugsbayasgalan/85/head -> origin/gh/tugsbayasgalan/85/head
2025-12-04T10:32:19.4097659Z  * [new branch]                gh/tugsbayasgalan/85/orig -> origin/gh/tugsbayasgalan/85/orig
2025-12-04T10:32:19.4097740Z  * [new branch]                gh/tugsbayasgalan/86/base -> origin/gh/tugsbayasgalan/86/base
2025-12-04T10:32:19.4097820Z  * [new branch]                gh/tugsbayasgalan/86/head -> origin/gh/tugsbayasgalan/86/head
2025-12-04T10:32:19.4097900Z  * [new branch]                gh/tugsbayasgalan/86/orig -> origin/gh/tugsbayasgalan/86/orig
2025-12-04T10:32:19.4097982Z  * [new branch]                gh/tugsbayasgalan/87/base -> origin/gh/tugsbayasgalan/87/base
2025-12-04T10:32:19.4098062Z  * [new branch]                gh/tugsbayasgalan/87/head -> origin/gh/tugsbayasgalan/87/head
2025-12-04T10:32:19.4098142Z  * [new branch]                gh/tugsbayasgalan/87/orig -> origin/gh/tugsbayasgalan/87/orig
2025-12-04T10:32:19.4098225Z  * [new branch]                gh/tugsbayasgalan/88/base -> origin/gh/tugsbayasgalan/88/base
2025-12-04T10:32:19.4098305Z  * [new branch]                gh/tugsbayasgalan/88/head -> origin/gh/tugsbayasgalan/88/head
2025-12-04T10:32:19.4098385Z  * [new branch]                gh/tugsbayasgalan/88/orig -> origin/gh/tugsbayasgalan/88/orig
2025-12-04T10:32:19.4098497Z  * [new branch]                gh/tugsbayasgalan/89/base -> origin/gh/tugsbayasgalan/89/base
2025-12-04T10:32:19.4098579Z  * [new branch]                gh/tugsbayasgalan/89/head -> origin/gh/tugsbayasgalan/89/head
2025-12-04T10:32:19.4098660Z  * [new branch]                gh/tugsbayasgalan/89/orig -> origin/gh/tugsbayasgalan/89/orig
2025-12-04T10:32:19.4098740Z  * [new branch]                gh/tugsbayasgalan/9/base -> origin/gh/tugsbayasgalan/9/base
2025-12-04T10:32:19.4098819Z  * [new branch]                gh/tugsbayasgalan/9/head -> origin/gh/tugsbayasgalan/9/head
2025-12-04T10:32:19.4098899Z  * [new branch]                gh/tugsbayasgalan/9/orig -> origin/gh/tugsbayasgalan/9/orig
2025-12-04T10:32:19.4098982Z  * [new branch]                gh/tugsbayasgalan/90/base -> origin/gh/tugsbayasgalan/90/base
2025-12-04T10:32:19.4099063Z  * [new branch]                gh/tugsbayasgalan/90/head -> origin/gh/tugsbayasgalan/90/head
2025-12-04T10:32:19.4099146Z  * [new branch]                gh/tugsbayasgalan/90/orig -> origin/gh/tugsbayasgalan/90/orig
2025-12-04T10:32:19.4099227Z  * [new branch]                gh/tugsbayasgalan/91/base -> origin/gh/tugsbayasgalan/91/base
2025-12-04T10:32:19.4099307Z  * [new branch]                gh/tugsbayasgalan/91/head -> origin/gh/tugsbayasgalan/91/head
2025-12-04T10:32:19.4099389Z  * [new branch]                gh/tugsbayasgalan/91/orig -> origin/gh/tugsbayasgalan/91/orig
2025-12-04T10:32:19.4099471Z  * [new branch]                gh/tugsbayasgalan/92/base -> origin/gh/tugsbayasgalan/92/base
2025-12-04T10:32:19.4099552Z  * [new branch]                gh/tugsbayasgalan/92/head -> origin/gh/tugsbayasgalan/92/head
2025-12-04T10:32:19.4099667Z  * [new branch]                gh/tugsbayasgalan/92/orig -> origin/gh/tugsbayasgalan/92/orig
2025-12-04T10:32:19.4099752Z  * [new branch]                gh/tugsbayasgalan/93/base -> origin/gh/tugsbayasgalan/93/base
2025-12-04T10:32:19.4099834Z  * [new branch]                gh/tugsbayasgalan/93/head -> origin/gh/tugsbayasgalan/93/head
2025-12-04T10:32:19.4099916Z  * [new branch]                gh/tugsbayasgalan/93/orig -> origin/gh/tugsbayasgalan/93/orig
2025-12-04T10:32:19.4099984Z  * [new branch]                gh/v0i0/14/base         -> origin/gh/v0i0/14/base
2025-12-04T10:32:19.4100048Z  * [new branch]                gh/v0i0/14/head         -> origin/gh/v0i0/14/head
2025-12-04T10:32:19.4100116Z  * [new branch]                gh/v0i0/14/orig         -> origin/gh/v0i0/14/orig
2025-12-04T10:32:19.4100180Z  * [new branch]                gh/v0i0/15/base         -> origin/gh/v0i0/15/base
2025-12-04T10:32:19.4100290Z  * [new branch]                gh/v0i0/15/head         -> origin/gh/v0i0/15/head
2025-12-04T10:32:19.4100351Z  * [new branch]                gh/v0i0/15/orig         -> origin/gh/v0i0/15/orig
2025-12-04T10:32:19.4100412Z  * [new branch]                gh/v0i0/16/base         -> origin/gh/v0i0/16/base
2025-12-04T10:32:19.4100473Z  * [new branch]                gh/v0i0/16/head         -> origin/gh/v0i0/16/head
2025-12-04T10:32:19.4100538Z  * [new branch]                gh/v0i0/16/orig         -> origin/gh/v0i0/16/orig
2025-12-04T10:32:19.4100599Z  * [new branch]                gh/v0i0/17/base         -> origin/gh/v0i0/17/base
2025-12-04T10:32:19.4100663Z  * [new branch]                gh/v0i0/17/head         -> origin/gh/v0i0/17/head
2025-12-04T10:32:19.4100726Z  * [new branch]                gh/v0i0/17/orig         -> origin/gh/v0i0/17/orig
2025-12-04T10:32:19.4100787Z  * [new branch]                gh/v0i0/18/base         -> origin/gh/v0i0/18/base
2025-12-04T10:32:19.4100848Z  * [new branch]                gh/v0i0/18/head         -> origin/gh/v0i0/18/head
2025-12-04T10:32:19.4100912Z  * [new branch]                gh/v0i0/18/orig         -> origin/gh/v0i0/18/orig
2025-12-04T10:32:19.4100973Z  * [new branch]                gh/v0i0/19/base         -> origin/gh/v0i0/19/base
2025-12-04T10:32:19.4101036Z  * [new branch]                gh/v0i0/19/head         -> origin/gh/v0i0/19/head
2025-12-04T10:32:19.4101132Z  * [new branch]                gh/v0i0/19/orig         -> origin/gh/v0i0/19/orig
2025-12-04T10:32:19.4101211Z  * [new branch]                gh/vishal9-team/1/base  -> origin/gh/vishal9-team/1/base
2025-12-04T10:32:19.4101288Z  * [new branch]                gh/vishal9-team/1/head  -> origin/gh/vishal9-team/1/head
2025-12-04T10:32:19.4101362Z  * [new branch]                gh/vishal9-team/2/base  -> origin/gh/vishal9-team/2/base
2025-12-04T10:32:19.4101435Z  * [new branch]                gh/vishal9-team/2/head  -> origin/gh/vishal9-team/2/head
2025-12-04T10:32:19.4101509Z  * [new branch]                gh/vishal9-team/2/orig  -> origin/gh/vishal9-team/2/orig
2025-12-04T10:32:19.4101584Z  * [new branch]                gh/vishal9-team/3/base  -> origin/gh/vishal9-team/3/base
2025-12-04T10:32:19.4101659Z  * [new branch]                gh/vishal9-team/3/head  -> origin/gh/vishal9-team/3/head
2025-12-04T10:32:19.4101732Z  * [new branch]                gh/vishal9-team/3/orig  -> origin/gh/vishal9-team/3/orig
2025-12-04T10:32:19.4101805Z  * [new branch]                gh/vishal9-team/4/base  -> origin/gh/vishal9-team/4/base
2025-12-04T10:32:19.4101877Z  * [new branch]                gh/vishal9-team/4/head  -> origin/gh/vishal9-team/4/head
2025-12-04T10:32:19.4101951Z  * [new branch]                gh/vishal9-team/4/orig  -> origin/gh/vishal9-team/4/orig
2025-12-04T10:32:19.4102017Z  * [new branch]                gh/vkuzo/1/next         -> origin/gh/vkuzo/1/next
2025-12-04T10:32:19.4102081Z  * [new branch]                gh/vkuzo/2/next         -> origin/gh/vkuzo/2/next
2025-12-04T10:32:19.4102151Z  * [new branch]                gh/vkuzo/3/next         -> origin/gh/vkuzo/3/next
2025-12-04T10:32:19.4102223Z  * [new branch]                gh/wconstab/424/base    -> origin/gh/wconstab/424/base
2025-12-04T10:32:19.4102298Z  * [new branch]                gh/wconstab/424/head    -> origin/gh/wconstab/424/head
2025-12-04T10:32:19.4102371Z  * [new branch]                gh/wconstab/424/orig    -> origin/gh/wconstab/424/orig
2025-12-04T10:32:19.4102443Z  * [new branch]                gh/wconstab/435/base    -> origin/gh/wconstab/435/base
2025-12-04T10:32:19.4102515Z  * [new branch]                gh/wconstab/435/head    -> origin/gh/wconstab/435/head
2025-12-04T10:32:19.4102584Z  * [new branch]                gh/wconstab/435/orig    -> origin/gh/wconstab/435/orig
2025-12-04T10:32:19.4102654Z  * [new branch]                gh/wconstab/444/base    -> origin/gh/wconstab/444/base
2025-12-04T10:32:19.4102724Z  * [new branch]                gh/wconstab/444/head    -> origin/gh/wconstab/444/head
2025-12-04T10:32:19.4102832Z  * [new branch]                gh/wconstab/444/orig    -> origin/gh/wconstab/444/orig
2025-12-04T10:32:19.4102902Z  * [new branch]                gh/wconstab/447/base    -> origin/gh/wconstab/447/base
2025-12-04T10:32:19.4102972Z  * [new branch]                gh/wconstab/447/head    -> origin/gh/wconstab/447/head
2025-12-04T10:32:19.4103041Z  * [new branch]                gh/wconstab/447/orig    -> origin/gh/wconstab/447/orig
2025-12-04T10:32:19.4103113Z  * [new branch]                gh/wconstab/448/base    -> origin/gh/wconstab/448/base
2025-12-04T10:32:19.4103184Z  * [new branch]                gh/wconstab/448/head    -> origin/gh/wconstab/448/head
2025-12-04T10:32:19.4103255Z  * [new branch]                gh/wconstab/448/orig    -> origin/gh/wconstab/448/orig
2025-12-04T10:32:19.4103324Z  * [new branch]                gh/wconstab/449/base    -> origin/gh/wconstab/449/base
2025-12-04T10:32:19.4103395Z  * [new branch]                gh/wconstab/449/head    -> origin/gh/wconstab/449/head
2025-12-04T10:32:19.4103467Z  * [new branch]                gh/wconstab/449/orig    -> origin/gh/wconstab/449/orig
2025-12-04T10:32:19.4103536Z  * [new branch]                gh/wconstab/450/base    -> origin/gh/wconstab/450/base
2025-12-04T10:32:19.4103608Z  * [new branch]                gh/wconstab/450/head    -> origin/gh/wconstab/450/head
2025-12-04T10:32:19.4103678Z  * [new branch]                gh/wconstab/450/orig    -> origin/gh/wconstab/450/orig
2025-12-04T10:32:19.4103778Z  * [new branch]                gh/wconstab/451/base    -> origin/gh/wconstab/451/base
2025-12-04T10:32:19.4103851Z  * [new branch]                gh/wconstab/451/head    -> origin/gh/wconstab/451/head
2025-12-04T10:32:19.4103921Z  * [new branch]                gh/wconstab/451/orig    -> origin/gh/wconstab/451/orig
2025-12-04T10:32:19.4103993Z  * [new branch]                gh/wconstab/452/base    -> origin/gh/wconstab/452/base
2025-12-04T10:32:19.4104064Z  * [new branch]                gh/wconstab/452/head    -> origin/gh/wconstab/452/head
2025-12-04T10:32:19.4104134Z  * [new branch]                gh/wconstab/452/orig    -> origin/gh/wconstab/452/orig
2025-12-04T10:32:19.4104205Z  * [new branch]                gh/wconstab/453/base    -> origin/gh/wconstab/453/base
2025-12-04T10:32:19.4104276Z  * [new branch]                gh/wconstab/453/head    -> origin/gh/wconstab/453/head
2025-12-04T10:32:19.4104347Z  * [new branch]                gh/wconstab/453/orig    -> origin/gh/wconstab/453/orig
2025-12-04T10:32:19.4104421Z  * [new branch]                gh/wconstab/454/base    -> origin/gh/wconstab/454/base
2025-12-04T10:32:19.4104490Z  * [new branch]                gh/wconstab/454/head    -> origin/gh/wconstab/454/head
2025-12-04T10:32:19.4104559Z  * [new branch]                gh/wconstab/454/orig    -> origin/gh/wconstab/454/orig
2025-12-04T10:32:19.4104632Z  * [new branch]                gh/wconstab/455/base    -> origin/gh/wconstab/455/base
2025-12-04T10:32:19.4104701Z  * [new branch]                gh/wconstab/455/head    -> origin/gh/wconstab/455/head
2025-12-04T10:32:19.4104772Z  * [new branch]                gh/wconstab/455/orig    -> origin/gh/wconstab/455/orig
2025-12-04T10:32:19.4104844Z  * [new branch]                gh/wconstab/456/base    -> origin/gh/wconstab/456/base
2025-12-04T10:32:19.4104913Z  * [new branch]                gh/wconstab/456/head    -> origin/gh/wconstab/456/head
2025-12-04T10:32:19.4104984Z  * [new branch]                gh/wconstab/456/orig    -> origin/gh/wconstab/456/orig
2025-12-04T10:32:19.4105056Z  * [new branch]                gh/wconstab/457/base    -> origin/gh/wconstab/457/base
2025-12-04T10:32:19.4105127Z  * [new branch]                gh/wconstab/457/head    -> origin/gh/wconstab/457/head
2025-12-04T10:32:19.4105197Z  * [new branch]                gh/wconstab/457/orig    -> origin/gh/wconstab/457/orig
2025-12-04T10:32:19.4105269Z  * [new branch]                gh/wconstab/458/base    -> origin/gh/wconstab/458/base
2025-12-04T10:32:19.4105340Z  * [new branch]                gh/wconstab/458/head    -> origin/gh/wconstab/458/head
2025-12-04T10:32:19.4105439Z  * [new branch]                gh/wconstab/458/orig    -> origin/gh/wconstab/458/orig
2025-12-04T10:32:19.4105508Z  * [new branch]                gh/wconstab/459/base    -> origin/gh/wconstab/459/base
2025-12-04T10:32:19.4105578Z  * [new branch]                gh/wconstab/459/head    -> origin/gh/wconstab/459/head
2025-12-04T10:32:19.4105650Z  * [new branch]                gh/wconstab/459/orig    -> origin/gh/wconstab/459/orig
2025-12-04T10:32:19.4105720Z  * [new branch]                gh/wconstab/460/base    -> origin/gh/wconstab/460/base
2025-12-04T10:32:19.4105788Z  * [new branch]                gh/wconstab/460/head    -> origin/gh/wconstab/460/head
2025-12-04T10:32:19.4105859Z  * [new branch]                gh/wconstab/460/orig    -> origin/gh/wconstab/460/orig
2025-12-04T10:32:19.4105928Z  * [new branch]                gh/wconstab/461/base    -> origin/gh/wconstab/461/base
2025-12-04T10:32:19.4105997Z  * [new branch]                gh/wconstab/461/head    -> origin/gh/wconstab/461/head
2025-12-04T10:32:19.4106069Z  * [new branch]                gh/wconstab/461/orig    -> origin/gh/wconstab/461/orig
2025-12-04T10:32:19.4106139Z  * [new branch]                gh/wconstab/462/base    -> origin/gh/wconstab/462/base
2025-12-04T10:32:19.4106208Z  * [new branch]                gh/wconstab/462/head    -> origin/gh/wconstab/462/head
2025-12-04T10:32:19.4106309Z  * [new branch]                gh/wconstab/462/orig    -> origin/gh/wconstab/462/orig
2025-12-04T10:32:19.4106380Z  * [new branch]                gh/wconstab/463/base    -> origin/gh/wconstab/463/base
2025-12-04T10:32:19.4106449Z  * [new branch]                gh/wconstab/463/head    -> origin/gh/wconstab/463/head
2025-12-04T10:32:19.4106519Z  * [new branch]                gh/wconstab/463/orig    -> origin/gh/wconstab/463/orig
2025-12-04T10:32:19.4106589Z  * [new branch]                gh/wconstab/464/base    -> origin/gh/wconstab/464/base
2025-12-04T10:32:19.4106658Z  * [new branch]                gh/wconstab/464/head    -> origin/gh/wconstab/464/head
2025-12-04T10:32:19.4106732Z  * [new branch]                gh/wconstab/464/orig    -> origin/gh/wconstab/464/orig
2025-12-04T10:32:19.4106802Z  * [new branch]                gh/wconstab/465/base    -> origin/gh/wconstab/465/base
2025-12-04T10:32:19.4106874Z  * [new branch]                gh/wconstab/465/head    -> origin/gh/wconstab/465/head
2025-12-04T10:32:19.4106944Z  * [new branch]                gh/wconstab/465/orig    -> origin/gh/wconstab/465/orig
2025-12-04T10:32:19.4107013Z  * [new branch]                gh/wconstab/466/base    -> origin/gh/wconstab/466/base
2025-12-04T10:32:19.4107085Z  * [new branch]                gh/wconstab/466/head    -> origin/gh/wconstab/466/head
2025-12-04T10:32:19.4107155Z  * [new branch]                gh/wconstab/466/orig    -> origin/gh/wconstab/466/orig
2025-12-04T10:32:19.4107225Z  * [new branch]                gh/wconstab/467/base    -> origin/gh/wconstab/467/base
2025-12-04T10:32:19.4107295Z  * [new branch]                gh/wconstab/467/head    -> origin/gh/wconstab/467/head
2025-12-04T10:32:19.4107368Z  * [new branch]                gh/wconstab/467/orig    -> origin/gh/wconstab/467/orig
2025-12-04T10:32:19.4107437Z  * [new branch]                gh/wconstab/468/base    -> origin/gh/wconstab/468/base
2025-12-04T10:32:19.4107508Z  * [new branch]                gh/wconstab/468/head    -> origin/gh/wconstab/468/head
2025-12-04T10:32:19.4107578Z  * [new branch]                gh/wconstab/468/orig    -> origin/gh/wconstab/468/orig
2025-12-04T10:32:19.4107650Z  * [new branch]                gh/weifengpy/39/base    -> origin/gh/weifengpy/39/base
2025-12-04T10:32:19.4107722Z  * [new branch]                gh/weifengpy/39/head    -> origin/gh/weifengpy/39/head
2025-12-04T10:32:19.4107794Z  * [new branch]                gh/weifengpy/39/orig    -> origin/gh/weifengpy/39/orig
2025-12-04T10:32:19.4107865Z  * [new branch]                gh/weifengpy/40/base    -> origin/gh/weifengpy/40/base
2025-12-04T10:32:19.4107937Z  * [new branch]                gh/weifengpy/40/head    -> origin/gh/weifengpy/40/head
2025-12-04T10:32:19.4108035Z  * [new branch]                gh/weifengpy/40/orig    -> origin/gh/weifengpy/40/orig
2025-12-04T10:32:19.4108105Z  * [new branch]                gh/weifengpy/41/base    -> origin/gh/weifengpy/41/base
2025-12-04T10:32:19.4108177Z  * [new branch]                gh/weifengpy/41/head    -> origin/gh/weifengpy/41/head
2025-12-04T10:32:19.4108250Z  * [new branch]                gh/weifengpy/41/orig    -> origin/gh/weifengpy/41/orig
2025-12-04T10:32:19.4108332Z  * [new branch]                gh/williamwen42/250/base -> origin/gh/williamwen42/250/base
2025-12-04T10:32:19.4108413Z  * [new branch]                gh/williamwen42/250/head -> origin/gh/williamwen42/250/head
2025-12-04T10:32:19.4108493Z  * [new branch]                gh/williamwen42/250/orig -> origin/gh/williamwen42/250/orig
2025-12-04T10:32:19.4108575Z  * [new branch]                gh/williamwen42/279/base -> origin/gh/williamwen42/279/base
2025-12-04T10:32:19.4108655Z  * [new branch]                gh/williamwen42/279/head -> origin/gh/williamwen42/279/head
2025-12-04T10:32:19.4108731Z  * [new branch]                gh/williamwen42/279/orig -> origin/gh/williamwen42/279/orig
2025-12-04T10:32:19.4108808Z  * [new branch]                gh/williamwen42/282/base -> origin/gh/williamwen42/282/base
2025-12-04T10:32:19.4108887Z  * [new branch]                gh/williamwen42/282/head -> origin/gh/williamwen42/282/head
2025-12-04T10:32:19.4108994Z  * [new branch]                gh/williamwen42/282/orig -> origin/gh/williamwen42/282/orig
2025-12-04T10:32:19.4109073Z  * [new branch]                gh/williamwen42/287/base -> origin/gh/williamwen42/287/base
2025-12-04T10:32:19.4109150Z  * [new branch]                gh/williamwen42/287/head -> origin/gh/williamwen42/287/head
2025-12-04T10:32:19.4109227Z  * [new branch]                gh/williamwen42/287/orig -> origin/gh/williamwen42/287/orig
2025-12-04T10:32:19.4109304Z  * [new branch]                gh/williamwen42/288/base -> origin/gh/williamwen42/288/base
2025-12-04T10:32:19.4109384Z  * [new branch]                gh/williamwen42/288/head -> origin/gh/williamwen42/288/head
2025-12-04T10:32:19.4109462Z  * [new branch]                gh/williamwen42/288/orig -> origin/gh/williamwen42/288/orig
2025-12-04T10:32:19.4109542Z  * [new branch]                gh/williamwen42/296/base -> origin/gh/williamwen42/296/base
2025-12-04T10:32:19.4109658Z  * [new branch]                gh/williamwen42/296/head -> origin/gh/williamwen42/296/head
2025-12-04T10:32:19.4109739Z  * [new branch]                gh/williamwen42/296/orig -> origin/gh/williamwen42/296/orig
2025-12-04T10:32:19.4109817Z  * [new branch]                gh/williamwen42/297/base -> origin/gh/williamwen42/297/base
2025-12-04T10:32:19.4109895Z  * [new branch]                gh/williamwen42/297/head -> origin/gh/williamwen42/297/head
2025-12-04T10:32:19.4109973Z  * [new branch]                gh/williamwen42/297/orig -> origin/gh/williamwen42/297/orig
2025-12-04T10:32:19.4110052Z  * [new branch]                gh/williamwen42/306/base -> origin/gh/williamwen42/306/base
2025-12-04T10:32:19.4110129Z  * [new branch]                gh/williamwen42/306/head -> origin/gh/williamwen42/306/head
2025-12-04T10:32:19.4110209Z  * [new branch]                gh/williamwen42/306/orig -> origin/gh/williamwen42/306/orig
2025-12-04T10:32:19.4110285Z  * [new branch]                gh/williamwen42/309/base -> origin/gh/williamwen42/309/base
2025-12-04T10:32:19.4110363Z  * [new branch]                gh/williamwen42/309/head -> origin/gh/williamwen42/309/head
2025-12-04T10:32:19.4110442Z  * [new branch]                gh/williamwen42/309/orig -> origin/gh/williamwen42/309/orig
2025-12-04T10:32:19.4110520Z  * [new branch]                gh/williamwen42/310/base -> origin/gh/williamwen42/310/base
2025-12-04T10:32:19.4110598Z  * [new branch]                gh/williamwen42/310/head -> origin/gh/williamwen42/310/head
2025-12-04T10:32:19.4110677Z  * [new branch]                gh/williamwen42/310/orig -> origin/gh/williamwen42/310/orig
2025-12-04T10:32:19.4110797Z  * [new branch]                gh/williamwen42/311/base -> origin/gh/williamwen42/311/base
2025-12-04T10:32:19.4110875Z  * [new branch]                gh/williamwen42/311/head -> origin/gh/williamwen42/311/head
2025-12-04T10:32:19.4110958Z  * [new branch]                gh/williamwen42/311/orig -> origin/gh/williamwen42/311/orig
2025-12-04T10:32:19.4111036Z  * [new branch]                gh/williamwen42/319/base -> origin/gh/williamwen42/319/base
2025-12-04T10:32:19.4111115Z  * [new branch]                gh/williamwen42/319/head -> origin/gh/williamwen42/319/head
2025-12-04T10:32:19.4111195Z  * [new branch]                gh/williamwen42/319/orig -> origin/gh/williamwen42/319/orig
2025-12-04T10:32:19.4111273Z  * [new branch]                gh/williamwen42/325/base -> origin/gh/williamwen42/325/base
2025-12-04T10:32:19.4111351Z  * [new branch]                gh/williamwen42/325/head -> origin/gh/williamwen42/325/head
2025-12-04T10:32:19.4111430Z  * [new branch]                gh/williamwen42/325/orig -> origin/gh/williamwen42/325/orig
2025-12-04T10:32:19.4111510Z  * [new branch]                gh/williamwen42/326/base -> origin/gh/williamwen42/326/base
2025-12-04T10:32:19.4111589Z  * [new branch]                gh/williamwen42/326/head -> origin/gh/williamwen42/326/head
2025-12-04T10:32:19.4111665Z  * [new branch]                gh/williamwen42/326/orig -> origin/gh/williamwen42/326/orig
2025-12-04T10:32:19.4111797Z  * [new branch]                gh/williamwen42/327/base -> origin/gh/williamwen42/327/base
2025-12-04T10:32:19.4111878Z  * [new branch]                gh/williamwen42/327/head -> origin/gh/williamwen42/327/head
2025-12-04T10:32:19.4111956Z  * [new branch]                gh/williamwen42/327/orig -> origin/gh/williamwen42/327/orig
2025-12-04T10:32:19.4112032Z  * [new branch]                gh/williamwen42/328/base -> origin/gh/williamwen42/328/base
2025-12-04T10:32:19.4112111Z  * [new branch]                gh/williamwen42/328/head -> origin/gh/williamwen42/328/head
2025-12-04T10:32:19.4112189Z  * [new branch]                gh/williamwen42/328/orig -> origin/gh/williamwen42/328/orig
2025-12-04T10:32:19.4112266Z  * [new branch]                gh/williamwen42/329/base -> origin/gh/williamwen42/329/base
2025-12-04T10:32:19.4112344Z  * [new branch]                gh/williamwen42/329/head -> origin/gh/williamwen42/329/head
2025-12-04T10:32:19.4112424Z  * [new branch]                gh/williamwen42/329/orig -> origin/gh/williamwen42/329/orig
2025-12-04T10:32:19.4112500Z  * [new branch]                gh/williamwen42/330/base -> origin/gh/williamwen42/330/base
2025-12-04T10:32:19.4112577Z  * [new branch]                gh/williamwen42/330/head -> origin/gh/williamwen42/330/head
2025-12-04T10:32:19.4112653Z  * [new branch]                gh/williamwen42/330/orig -> origin/gh/williamwen42/330/orig
2025-12-04T10:32:19.4112731Z  * [new branch]                gh/williamwen42/331/base -> origin/gh/williamwen42/331/base
2025-12-04T10:32:19.4112809Z  * [new branch]                gh/williamwen42/331/head -> origin/gh/williamwen42/331/head
2025-12-04T10:32:19.4112889Z  * [new branch]                gh/williamwen42/331/orig -> origin/gh/williamwen42/331/orig
2025-12-04T10:32:19.4112966Z  * [new branch]                gh/williamwen42/332/base -> origin/gh/williamwen42/332/base
2025-12-04T10:32:19.4113043Z  * [new branch]                gh/williamwen42/332/head -> origin/gh/williamwen42/332/head
2025-12-04T10:32:19.4113120Z  * [new branch]                gh/williamwen42/332/orig -> origin/gh/williamwen42/332/orig
2025-12-04T10:32:19.4113199Z  * [new branch]                gh/williamwen42/333/base -> origin/gh/williamwen42/333/base
2025-12-04T10:32:19.4113275Z  * [new branch]                gh/williamwen42/333/head -> origin/gh/williamwen42/333/head
2025-12-04T10:32:19.4113352Z  * [new branch]                gh/williamwen42/333/orig -> origin/gh/williamwen42/333/orig
2025-12-04T10:32:19.4113431Z  * [new branch]                gh/williamwen42/334/base -> origin/gh/williamwen42/334/base
2025-12-04T10:32:19.4113537Z  * [new branch]                gh/williamwen42/334/head -> origin/gh/williamwen42/334/head
2025-12-04T10:32:19.4113616Z  * [new branch]                gh/williamwen42/334/orig -> origin/gh/williamwen42/334/orig
2025-12-04T10:32:19.4113694Z  * [new branch]                gh/williamwen42/335/base -> origin/gh/williamwen42/335/base
2025-12-04T10:32:19.4113773Z  * [new branch]                gh/williamwen42/335/head -> origin/gh/williamwen42/335/head
2025-12-04T10:32:19.4113850Z  * [new branch]                gh/williamwen42/335/orig -> origin/gh/williamwen42/335/orig
2025-12-04T10:32:19.4113929Z  * [new branch]                gh/williamwen42/336/base -> origin/gh/williamwen42/336/base
2025-12-04T10:32:19.4114006Z  * [new branch]                gh/williamwen42/336/head -> origin/gh/williamwen42/336/head
2025-12-04T10:32:19.4114083Z  * [new branch]                gh/williamwen42/336/orig -> origin/gh/williamwen42/336/orig
2025-12-04T10:32:19.4114162Z  * [new branch]                gh/williamwen42/337/base -> origin/gh/williamwen42/337/base
2025-12-04T10:32:19.4114241Z  * [new branch]                gh/williamwen42/337/head -> origin/gh/williamwen42/337/head
2025-12-04T10:32:19.4114319Z  * [new branch]                gh/williamwen42/337/orig -> origin/gh/williamwen42/337/orig
2025-12-04T10:32:19.4114398Z  * [new branch]                gh/williamwen42/338/base -> origin/gh/williamwen42/338/base
2025-12-04T10:32:19.4115430Z  * [new branch]                gh/williamwen42/338/head -> origin/gh/williamwen42/338/head
2025-12-04T10:32:19.4115510Z  * [new branch]                gh/williamwen42/338/orig -> origin/gh/williamwen42/338/orig
2025-12-04T10:32:19.4115586Z  * [new branch]                gh/williamwen42/339/base -> origin/gh/williamwen42/339/base
2025-12-04T10:32:19.4115662Z  * [new branch]                gh/williamwen42/339/head -> origin/gh/williamwen42/339/head
2025-12-04T10:32:19.4115739Z  * [new branch]                gh/williamwen42/339/orig -> origin/gh/williamwen42/339/orig
2025-12-04T10:32:19.4115817Z  * [new branch]                gh/williamwen42/340/base -> origin/gh/williamwen42/340/base
2025-12-04T10:32:19.4115894Z  * [new branch]                gh/williamwen42/340/head -> origin/gh/williamwen42/340/head
2025-12-04T10:32:19.4115974Z  * [new branch]                gh/williamwen42/340/orig -> origin/gh/williamwen42/340/orig
2025-12-04T10:32:19.4116052Z  * [new branch]                gh/williamwen42/341/base -> origin/gh/williamwen42/341/base
2025-12-04T10:32:19.4116130Z  * [new branch]                gh/williamwen42/341/head -> origin/gh/williamwen42/341/head
2025-12-04T10:32:19.4116208Z  * [new branch]                gh/williamwen42/341/orig -> origin/gh/williamwen42/341/orig
2025-12-04T10:32:19.4116288Z  * [new branch]                gh/williamwen42/342/base -> origin/gh/williamwen42/342/base
2025-12-04T10:32:19.4116365Z  * [new branch]                gh/williamwen42/342/head -> origin/gh/williamwen42/342/head
2025-12-04T10:32:19.4116443Z  * [new branch]                gh/williamwen42/342/orig -> origin/gh/williamwen42/342/orig
2025-12-04T10:32:19.4116521Z  * [new branch]                gh/williamwen42/343/base -> origin/gh/williamwen42/343/base
2025-12-04T10:32:19.4116598Z  * [new branch]                gh/williamwen42/343/head -> origin/gh/williamwen42/343/head
2025-12-04T10:32:19.4116679Z  * [new branch]                gh/williamwen42/343/orig -> origin/gh/williamwen42/343/orig
2025-12-04T10:32:19.4116760Z  * [new branch]                gh/williamwen42/344/base -> origin/gh/williamwen42/344/base
2025-12-04T10:32:19.4116839Z  * [new branch]                gh/williamwen42/344/head -> origin/gh/williamwen42/344/head
2025-12-04T10:32:19.4116916Z  * [new branch]                gh/williamwen42/344/orig -> origin/gh/williamwen42/344/orig
2025-12-04T10:32:19.4116993Z  * [new branch]                gh/williamwen42/345/base -> origin/gh/williamwen42/345/base
2025-12-04T10:32:19.4117072Z  * [new branch]                gh/williamwen42/345/head -> origin/gh/williamwen42/345/head
2025-12-04T10:32:19.4117178Z  * [new branch]                gh/williamwen42/345/orig -> origin/gh/williamwen42/345/orig
2025-12-04T10:32:19.4117255Z  * [new branch]                gh/williamwen42/346/base -> origin/gh/williamwen42/346/base
2025-12-04T10:32:19.4117334Z  * [new branch]                gh/williamwen42/346/head -> origin/gh/williamwen42/346/head
2025-12-04T10:32:19.4117414Z  * [new branch]                gh/williamwen42/346/orig -> origin/gh/williamwen42/346/orig
2025-12-04T10:32:19.4117492Z  * [new branch]                gh/williamwen42/347/base -> origin/gh/williamwen42/347/base
2025-12-04T10:32:19.4117569Z  * [new branch]                gh/williamwen42/347/head -> origin/gh/williamwen42/347/head
2025-12-04T10:32:19.4117645Z  * [new branch]                gh/williamwen42/347/orig -> origin/gh/williamwen42/347/orig
2025-12-04T10:32:19.4117721Z  * [new branch]                gh/williamwen42/348/base -> origin/gh/williamwen42/348/base
2025-12-04T10:32:19.4117799Z  * [new branch]                gh/williamwen42/348/head -> origin/gh/williamwen42/348/head
2025-12-04T10:32:19.4117878Z  * [new branch]                gh/williamwen42/348/orig -> origin/gh/williamwen42/348/orig
2025-12-04T10:32:19.4117955Z  * [new branch]                gh/williamwen42/349/base -> origin/gh/williamwen42/349/base
2025-12-04T10:32:19.4118034Z  * [new branch]                gh/williamwen42/349/head -> origin/gh/williamwen42/349/head
2025-12-04T10:32:19.4118135Z  * [new branch]                gh/williamwen42/349/orig -> origin/gh/williamwen42/349/orig
2025-12-04T10:32:19.4118213Z  * [new branch]                gh/williamwen42/350/base -> origin/gh/williamwen42/350/base
2025-12-04T10:32:19.4118290Z  * [new branch]                gh/williamwen42/350/head -> origin/gh/williamwen42/350/head
2025-12-04T10:32:19.4118367Z  * [new branch]                gh/williamwen42/350/orig -> origin/gh/williamwen42/350/orig
2025-12-04T10:32:19.4118446Z  * [new branch]                gh/williamwen42/351/base -> origin/gh/williamwen42/351/base
2025-12-04T10:32:19.4118526Z  * [new branch]                gh/williamwen42/351/head -> origin/gh/williamwen42/351/head
2025-12-04T10:32:19.4118602Z  * [new branch]                gh/williamwen42/351/orig -> origin/gh/williamwen42/351/orig
2025-12-04T10:32:19.4118683Z  * [new branch]                gh/williamwen42/352/base -> origin/gh/williamwen42/352/base
2025-12-04T10:32:19.4118759Z  * [new branch]                gh/williamwen42/352/head -> origin/gh/williamwen42/352/head
2025-12-04T10:32:19.4118837Z  * [new branch]                gh/williamwen42/352/orig -> origin/gh/williamwen42/352/orig
2025-12-04T10:32:19.4118918Z  * [new branch]                gh/williamwen42/353/base -> origin/gh/williamwen42/353/base
2025-12-04T10:32:19.4118995Z  * [new branch]                gh/williamwen42/353/head -> origin/gh/williamwen42/353/head
2025-12-04T10:32:19.4119071Z  * [new branch]                gh/williamwen42/353/orig -> origin/gh/williamwen42/353/orig
2025-12-04T10:32:19.4119150Z  * [new branch]                gh/williamwen42/354/base -> origin/gh/williamwen42/354/base
2025-12-04T10:32:19.4119228Z  * [new branch]                gh/williamwen42/354/head -> origin/gh/williamwen42/354/head
2025-12-04T10:32:19.4119305Z  * [new branch]                gh/williamwen42/354/orig -> origin/gh/williamwen42/354/orig
2025-12-04T10:32:19.4119385Z  * [new branch]                gh/williamwen42/355/base -> origin/gh/williamwen42/355/base
2025-12-04T10:32:19.4119464Z  * [new branch]                gh/williamwen42/355/head -> origin/gh/williamwen42/355/head
2025-12-04T10:32:19.4119541Z  * [new branch]                gh/williamwen42/355/orig -> origin/gh/williamwen42/355/orig
2025-12-04T10:32:19.4119656Z  * [new branch]                gh/williamwen42/356/base -> origin/gh/williamwen42/356/base
2025-12-04T10:32:19.4119737Z  * [new branch]                gh/williamwen42/356/head -> origin/gh/williamwen42/356/head
2025-12-04T10:32:19.4119815Z  * [new branch]                gh/williamwen42/356/orig -> origin/gh/williamwen42/356/orig
2025-12-04T10:32:19.4119942Z  * [new branch]                gh/williamwen42/357/base -> origin/gh/williamwen42/357/base
2025-12-04T10:32:19.4120020Z  * [new branch]                gh/williamwen42/357/head -> origin/gh/williamwen42/357/head
2025-12-04T10:32:19.4120100Z  * [new branch]                gh/williamwen42/357/orig -> origin/gh/williamwen42/357/orig
2025-12-04T10:32:19.4120180Z  * [new branch]                gh/williamwen42/358/base -> origin/gh/williamwen42/358/base
2025-12-04T10:32:19.4120260Z  * [new branch]                gh/williamwen42/358/head -> origin/gh/williamwen42/358/head
2025-12-04T10:32:19.4120340Z  * [new branch]                gh/williamwen42/358/orig -> origin/gh/williamwen42/358/orig
2025-12-04T10:32:19.4120409Z  * [new branch]                gh/xmfan/169/base       -> origin/gh/xmfan/169/base
2025-12-04T10:32:19.4120475Z  * [new branch]                gh/xmfan/169/head       -> origin/gh/xmfan/169/head
2025-12-04T10:32:19.4120543Z  * [new branch]                gh/xmfan/170/base       -> origin/gh/xmfan/170/base
2025-12-04T10:32:19.4120611Z  * [new branch]                gh/xmfan/170/head       -> origin/gh/xmfan/170/head
2025-12-04T10:32:19.4120678Z  * [new branch]                gh/xmfan/274/base       -> origin/gh/xmfan/274/base
2025-12-04T10:32:19.4120743Z  * [new branch]                gh/xmfan/274/head       -> origin/gh/xmfan/274/head
2025-12-04T10:32:19.4120808Z  * [new branch]                gh/xmfan/274/orig       -> origin/gh/xmfan/274/orig
2025-12-04T10:32:19.4120917Z  * [new branch]                gh/xmfan/277/base       -> origin/gh/xmfan/277/base
2025-12-04T10:32:19.4120985Z  * [new branch]                gh/xmfan/277/head       -> origin/gh/xmfan/277/head
2025-12-04T10:32:19.4121050Z  * [new branch]                gh/xmfan/277/orig       -> origin/gh/xmfan/277/orig
2025-12-04T10:32:19.4121116Z  * [new branch]                gh/xmfan/301/base       -> origin/gh/xmfan/301/base
2025-12-04T10:32:19.4121183Z  * [new branch]                gh/xmfan/301/head       -> origin/gh/xmfan/301/head
2025-12-04T10:32:19.4121249Z  * [new branch]                gh/xmfan/301/orig       -> origin/gh/xmfan/301/orig
2025-12-04T10:32:19.4121315Z  * [new branch]                gh/xmfan/304/base       -> origin/gh/xmfan/304/base
2025-12-04T10:32:19.4121381Z  * [new branch]                gh/xmfan/304/head       -> origin/gh/xmfan/304/head
2025-12-04T10:32:19.4121447Z  * [new branch]                gh/xmfan/304/orig       -> origin/gh/xmfan/304/orig
2025-12-04T10:32:19.4121516Z  * [new branch]                gh/xmfan/309/base       -> origin/gh/xmfan/309/base
2025-12-04T10:32:19.4121582Z  * [new branch]                gh/xmfan/309/head       -> origin/gh/xmfan/309/head
2025-12-04T10:32:19.4121647Z  * [new branch]                gh/xmfan/309/orig       -> origin/gh/xmfan/309/orig
2025-12-04T10:32:19.4121715Z  * [new branch]                gh/xmfan/310/base       -> origin/gh/xmfan/310/base
2025-12-04T10:32:19.4121779Z  * [new branch]                gh/xmfan/310/head       -> origin/gh/xmfan/310/head
2025-12-04T10:32:19.4121848Z  * [new branch]                gh/xmfan/310/orig       -> origin/gh/xmfan/310/orig
2025-12-04T10:32:19.4121915Z  * [new branch]                gh/xmfan/311/base       -> origin/gh/xmfan/311/base
2025-12-04T10:32:19.4121980Z  * [new branch]                gh/xmfan/311/head       -> origin/gh/xmfan/311/head
2025-12-04T10:32:19.4122044Z  * [new branch]                gh/xmfan/311/orig       -> origin/gh/xmfan/311/orig
2025-12-04T10:32:19.4122111Z  * [new branch]                gh/xmfan/312/base       -> origin/gh/xmfan/312/base
2025-12-04T10:32:19.4122177Z  * [new branch]                gh/xmfan/312/head       -> origin/gh/xmfan/312/head
2025-12-04T10:32:19.4122243Z  * [new branch]                gh/xmfan/312/orig       -> origin/gh/xmfan/312/orig
2025-12-04T10:32:19.4122310Z  * [new branch]                gh/xmfan/313/base       -> origin/gh/xmfan/313/base
2025-12-04T10:32:19.4122376Z  * [new branch]                gh/xmfan/313/head       -> origin/gh/xmfan/313/head
2025-12-04T10:32:19.4122440Z  * [new branch]                gh/xmfan/313/orig       -> origin/gh/xmfan/313/orig
2025-12-04T10:32:19.4122549Z  * [new branch]                gh/xuanzhang816/27/base -> origin/gh/xuanzhang816/27/base
2025-12-04T10:32:19.4122627Z  * [new branch]                gh/xuanzhang816/27/head -> origin/gh/xuanzhang816/27/head
2025-12-04T10:32:19.4122702Z  * [new branch]                gh/xuanzhang816/27/orig -> origin/gh/xuanzhang816/27/orig
2025-12-04T10:32:19.4122778Z  * [new branch]                gh/xuanzhang816/32/base -> origin/gh/xuanzhang816/32/base
2025-12-04T10:32:19.4122854Z  * [new branch]                gh/xuanzhang816/32/head -> origin/gh/xuanzhang816/32/head
2025-12-04T10:32:19.4122929Z  * [new branch]                gh/xuanzhang816/32/orig -> origin/gh/xuanzhang816/32/orig
2025-12-04T10:32:19.4123003Z  * [new branch]                gh/xuanzhang816/33/base -> origin/gh/xuanzhang816/33/base
2025-12-04T10:32:19.4123076Z  * [new branch]                gh/xuanzhang816/33/head -> origin/gh/xuanzhang816/33/head
2025-12-04T10:32:19.4123153Z  * [new branch]                gh/xuanzhang816/33/orig -> origin/gh/xuanzhang816/33/orig
2025-12-04T10:32:19.4123227Z  * [new branch]                gh/xuanzhang816/34/base -> origin/gh/xuanzhang816/34/base
2025-12-04T10:32:19.4123300Z  * [new branch]                gh/xuanzhang816/34/head -> origin/gh/xuanzhang816/34/head
2025-12-04T10:32:19.4123376Z  * [new branch]                gh/xuanzhang816/34/orig -> origin/gh/xuanzhang816/34/orig
2025-12-04T10:32:19.4123486Z  * [new branch]                gh/xuanzhang816/35/base -> origin/gh/xuanzhang816/35/base
2025-12-04T10:32:19.4123559Z  * [new branch]                gh/xuanzhang816/35/head -> origin/gh/xuanzhang816/35/head
2025-12-04T10:32:19.4123635Z  * [new branch]                gh/xuanzhang816/35/orig -> origin/gh/xuanzhang816/35/orig
2025-12-04T10:32:19.4123707Z  * [new branch]                gh/yanbing-j/11/base    -> origin/gh/yanbing-j/11/base
2025-12-04T10:32:19.4123779Z  * [new branch]                gh/yanbing-j/11/head    -> origin/gh/yanbing-j/11/head
2025-12-04T10:32:19.4123852Z  * [new branch]                gh/yanbing-j/11/orig    -> origin/gh/yanbing-j/11/orig
2025-12-04T10:32:19.4123921Z  * [new branch]                gh/yanbing-j/12/base    -> origin/gh/yanbing-j/12/base
2025-12-04T10:32:19.4123989Z  * [new branch]                gh/yanbing-j/12/head    -> origin/gh/yanbing-j/12/head
2025-12-04T10:32:19.4124061Z  * [new branch]                gh/yanbing-j/12/orig    -> origin/gh/yanbing-j/12/orig
2025-12-04T10:32:19.4124129Z  * [new branch]                gh/yanbing-j/13/base    -> origin/gh/yanbing-j/13/base
2025-12-04T10:32:19.4124199Z  * [new branch]                gh/yanbing-j/13/head    -> origin/gh/yanbing-j/13/head
2025-12-04T10:32:19.4124269Z  * [new branch]                gh/yanbing-j/13/orig    -> origin/gh/yanbing-j/13/orig
2025-12-04T10:32:19.4124339Z  * [new branch]                gh/yanbing-j/14/base    -> origin/gh/yanbing-j/14/base
2025-12-04T10:32:19.4124408Z  * [new branch]                gh/yanbing-j/14/head    -> origin/gh/yanbing-j/14/head
2025-12-04T10:32:19.4124479Z  * [new branch]                gh/yanbing-j/14/orig    -> origin/gh/yanbing-j/14/orig
2025-12-04T10:32:19.4124547Z  * [new branch]                gh/yanbing-j/15/base    -> origin/gh/yanbing-j/15/base
2025-12-04T10:32:19.4124618Z  * [new branch]                gh/yanbing-j/15/head    -> origin/gh/yanbing-j/15/head
2025-12-04T10:32:19.4124687Z  * [new branch]                gh/yanbing-j/15/orig    -> origin/gh/yanbing-j/15/orig
2025-12-04T10:32:19.4124754Z  * [new branch]                gh/yanbing-j/18/base    -> origin/gh/yanbing-j/18/base
2025-12-04T10:32:19.4124825Z  * [new branch]                gh/yanbing-j/18/head    -> origin/gh/yanbing-j/18/head
2025-12-04T10:32:19.4124894Z  * [new branch]                gh/yanbing-j/18/orig    -> origin/gh/yanbing-j/18/orig
2025-12-04T10:32:19.4124962Z  * [new branch]                gh/yanbing-j/19/base    -> origin/gh/yanbing-j/19/base
2025-12-04T10:32:19.4125034Z  * [new branch]                gh/yanbing-j/19/head    -> origin/gh/yanbing-j/19/head
2025-12-04T10:32:19.4125155Z  * [new branch]                gh/yanbing-j/19/orig    -> origin/gh/yanbing-j/19/orig
2025-12-04T10:32:19.4125224Z  * [new branch]                gh/yanbing-j/20/base    -> origin/gh/yanbing-j/20/base
2025-12-04T10:32:19.4125296Z  * [new branch]                gh/yanbing-j/20/head    -> origin/gh/yanbing-j/20/head
2025-12-04T10:32:19.4125367Z  * [new branch]                gh/yanbing-j/20/orig    -> origin/gh/yanbing-j/20/orig
2025-12-04T10:32:19.4125435Z  * [new branch]                gh/yanbing-j/21/base    -> origin/gh/yanbing-j/21/base
2025-12-04T10:32:19.4125505Z  * [new branch]                gh/yanbing-j/21/head    -> origin/gh/yanbing-j/21/head
2025-12-04T10:32:19.4125575Z  * [new branch]                gh/yanbing-j/22/base    -> origin/gh/yanbing-j/22/base
2025-12-04T10:32:19.4125643Z  * [new branch]                gh/yanbing-j/22/head    -> origin/gh/yanbing-j/22/head
2025-12-04T10:32:19.4125712Z  * [new branch]                gh/yanbing-j/22/orig    -> origin/gh/yanbing-j/22/orig
2025-12-04T10:32:19.4125782Z  * [new branch]                gh/yanbing-j/23/base    -> origin/gh/yanbing-j/23/base
2025-12-04T10:32:19.4125851Z  * [new branch]                gh/yanbing-j/23/head    -> origin/gh/yanbing-j/23/head
2025-12-04T10:32:19.4125923Z  * [new branch]                gh/yanbing-j/23/orig    -> origin/gh/yanbing-j/23/orig
2025-12-04T10:32:19.4126022Z  * [new branch]                gh/yanbing-j/24/base    -> origin/gh/yanbing-j/24/base
2025-12-04T10:32:19.4126092Z  * [new branch]                gh/yanbing-j/24/head    -> origin/gh/yanbing-j/24/head
2025-12-04T10:32:19.4126164Z  * [new branch]                gh/yanbing-j/24/orig    -> origin/gh/yanbing-j/24/orig
2025-12-04T10:32:19.4126232Z  * [new branch]                gh/yanbing-j/25/base    -> origin/gh/yanbing-j/25/base
2025-12-04T10:32:19.4126302Z  * [new branch]                gh/yanbing-j/25/head    -> origin/gh/yanbing-j/25/head
2025-12-04T10:32:19.4126371Z  * [new branch]                gh/yanbing-j/25/orig    -> origin/gh/yanbing-j/25/orig
2025-12-04T10:32:19.4126441Z  * [new branch]                gh/yanbing-j/26/base    -> origin/gh/yanbing-j/26/base
2025-12-04T10:32:19.4126513Z  * [new branch]                gh/yanbing-j/26/head    -> origin/gh/yanbing-j/26/head
2025-12-04T10:32:19.4126582Z  * [new branch]                gh/yanbing-j/26/orig    -> origin/gh/yanbing-j/26/orig
2025-12-04T10:32:19.4126662Z  * [new branch]                gh/yang-yu-hang/1/base  -> origin/gh/yang-yu-hang/1/base
2025-12-04T10:32:19.4126740Z  * [new branch]                gh/yang-yu-hang/1/head  -> origin/gh/yang-yu-hang/1/head
2025-12-04T10:32:19.4126813Z  * [new branch]                gh/yang-yu-hang/1/orig  -> origin/gh/yang-yu-hang/1/orig
2025-12-04T10:32:19.4126887Z  * [new branch]                gh/yang-yu-hang/2/base  -> origin/gh/yang-yu-hang/2/base
2025-12-04T10:32:19.4126960Z  * [new branch]                gh/yang-yu-hang/2/head  -> origin/gh/yang-yu-hang/2/head
2025-12-04T10:32:19.4127033Z  * [new branch]                gh/yang-yu-hang/2/orig  -> origin/gh/yang-yu-hang/2/orig
2025-12-04T10:32:19.4127106Z  * [new branch]                gh/yang-yu-hang/3/base  -> origin/gh/yang-yu-hang/3/base
2025-12-04T10:32:19.4127182Z  * [new branch]                gh/yang-yu-hang/3/head  -> origin/gh/yang-yu-hang/3/head
2025-12-04T10:32:19.4127254Z  * [new branch]                gh/yang-yu-hang/3/orig  -> origin/gh/yang-yu-hang/3/orig
2025-12-04T10:32:19.4127327Z  * [new branch]                gh/yangw-dev/12/base    -> origin/gh/yangw-dev/12/base
2025-12-04T10:32:19.4127400Z  * [new branch]                gh/yangw-dev/12/head    -> origin/gh/yangw-dev/12/head
2025-12-04T10:32:19.4127471Z  * [new branch]                gh/yangw-dev/12/orig    -> origin/gh/yangw-dev/12/orig
2025-12-04T10:32:19.4127541Z  * [new branch]                gh/yangw-dev/13/base    -> origin/gh/yangw-dev/13/base
2025-12-04T10:32:19.4127615Z  * [new branch]                gh/yangw-dev/13/head    -> origin/gh/yangw-dev/13/head
2025-12-04T10:32:19.4127712Z  * [new branch]                gh/yangw-dev/13/orig    -> origin/gh/yangw-dev/13/orig
2025-12-04T10:32:19.4127785Z  * [new branch]                gh/yangw-dev/14/base    -> origin/gh/yangw-dev/14/base
2025-12-04T10:32:19.4127853Z  * [new branch]                gh/yangw-dev/14/head    -> origin/gh/yangw-dev/14/head
2025-12-04T10:32:19.4127921Z  * [new branch]                gh/yangw-dev/14/orig    -> origin/gh/yangw-dev/14/orig
2025-12-04T10:32:19.4127992Z  * [new branch]                gh/yangw-dev/15/base    -> origin/gh/yangw-dev/15/base
2025-12-04T10:32:19.4128060Z  * [new branch]                gh/yangw-dev/15/head    -> origin/gh/yangw-dev/15/head
2025-12-04T10:32:19.4128128Z  * [new branch]                gh/yangw-dev/15/orig    -> origin/gh/yangw-dev/15/orig
2025-12-04T10:32:19.4128197Z  * [new branch]                gh/yangw-dev/19/base    -> origin/gh/yangw-dev/19/base
2025-12-04T10:32:19.4128267Z  * [new branch]                gh/yangw-dev/19/head    -> origin/gh/yangw-dev/19/head
2025-12-04T10:32:19.4128336Z  * [new branch]                gh/yangw-dev/19/orig    -> origin/gh/yangw-dev/19/orig
2025-12-04T10:32:19.4128409Z  * [new branch]                gh/yangw-dev/26/base    -> origin/gh/yangw-dev/26/base
2025-12-04T10:32:19.4128478Z  * [new branch]                gh/yangw-dev/26/head    -> origin/gh/yangw-dev/26/head
2025-12-04T10:32:19.4128546Z  * [new branch]                gh/yangw-dev/26/orig    -> origin/gh/yangw-dev/26/orig
2025-12-04T10:32:19.4128654Z  * [new branch]                gh/yangw-dev/27/base    -> origin/gh/yangw-dev/27/base
2025-12-04T10:32:19.4128723Z  * [new branch]                gh/yangw-dev/27/head    -> origin/gh/yangw-dev/27/head
2025-12-04T10:32:19.4128791Z  * [new branch]                gh/yangw-dev/27/orig    -> origin/gh/yangw-dev/27/orig
2025-12-04T10:32:19.4128860Z  * [new branch]                gh/ydwu4/292/base       -> origin/gh/ydwu4/292/base
2025-12-04T10:32:19.4128927Z  * [new branch]                gh/ydwu4/292/head       -> origin/gh/ydwu4/292/head
2025-12-04T10:32:19.4128995Z  * [new branch]                gh/ydwu4/292/orig       -> origin/gh/ydwu4/292/orig
2025-12-04T10:32:19.4129062Z  * [new branch]                gh/ydwu4/294/base       -> origin/gh/ydwu4/294/base
2025-12-04T10:32:19.4129126Z  * [new branch]                gh/ydwu4/294/head       -> origin/gh/ydwu4/294/head
2025-12-04T10:32:19.4129190Z  * [new branch]                gh/ydwu4/294/orig       -> origin/gh/ydwu4/294/orig
2025-12-04T10:32:19.4129258Z  * [new branch]                gh/ydwu4/295/base       -> origin/gh/ydwu4/295/base
2025-12-04T10:32:19.4129322Z  * [new branch]                gh/ydwu4/295/head       -> origin/gh/ydwu4/295/head
2025-12-04T10:32:19.4129387Z  * [new branch]                gh/ydwu4/295/orig       -> origin/gh/ydwu4/295/orig
2025-12-04T10:32:19.4129453Z  * [new branch]                gh/ydwu4/296/base       -> origin/gh/ydwu4/296/base
2025-12-04T10:32:19.4129519Z  * [new branch]                gh/ydwu4/296/head       -> origin/gh/ydwu4/296/head
2025-12-04T10:32:19.4129620Z  * [new branch]                gh/ydwu4/296/orig       -> origin/gh/ydwu4/296/orig
2025-12-04T10:32:19.4129688Z  * [new branch]                gh/ydwu4/306/base       -> origin/gh/ydwu4/306/base
2025-12-04T10:32:19.4129752Z  * [new branch]                gh/ydwu4/306/head       -> origin/gh/ydwu4/306/head
2025-12-04T10:32:19.4129818Z  * [new branch]                gh/ydwu4/306/orig       -> origin/gh/ydwu4/306/orig
2025-12-04T10:32:19.4129883Z  * [new branch]                gh/ydwu4/312/base       -> origin/gh/ydwu4/312/base
2025-12-04T10:32:19.4129947Z  * [new branch]                gh/ydwu4/312/head       -> origin/gh/ydwu4/312/head
2025-12-04T10:32:19.4130014Z  * [new branch]                gh/ydwu4/312/orig       -> origin/gh/ydwu4/312/orig
2025-12-04T10:32:19.4130078Z  * [new branch]                gh/ydwu4/322/base       -> origin/gh/ydwu4/322/base
2025-12-04T10:32:19.4130141Z  * [new branch]                gh/ydwu4/322/head       -> origin/gh/ydwu4/322/head
2025-12-04T10:32:19.4130250Z  * [new branch]                gh/ydwu4/322/orig       -> origin/gh/ydwu4/322/orig
2025-12-04T10:32:19.4130314Z  * [new branch]                gh/ydwu4/327/base       -> origin/gh/ydwu4/327/base
2025-12-04T10:32:19.4130380Z  * [new branch]                gh/ydwu4/327/head       -> origin/gh/ydwu4/327/head
2025-12-04T10:32:19.4130446Z  * [new branch]                gh/ydwu4/327/orig       -> origin/gh/ydwu4/327/orig
2025-12-04T10:32:19.4130513Z  * [new branch]                gh/ydwu4/328/base       -> origin/gh/ydwu4/328/base
2025-12-04T10:32:19.4130577Z  * [new branch]                gh/ydwu4/328/head       -> origin/gh/ydwu4/328/head
2025-12-04T10:32:19.4130643Z  * [new branch]                gh/ydwu4/328/orig       -> origin/gh/ydwu4/328/orig
2025-12-04T10:32:19.4130707Z  * [new branch]                gh/ydwu4/329/base       -> origin/gh/ydwu4/329/base
2025-12-04T10:32:19.4130773Z  * [new branch]                gh/ydwu4/329/head       -> origin/gh/ydwu4/329/head
2025-12-04T10:32:19.4130839Z  * [new branch]                gh/ydwu4/329/orig       -> origin/gh/ydwu4/329/orig
2025-12-04T10:32:19.4130906Z  * [new branch]                gh/ydwu4/330/base       -> origin/gh/ydwu4/330/base
2025-12-04T10:32:19.4130973Z  * [new branch]                gh/ydwu4/330/head       -> origin/gh/ydwu4/330/head
2025-12-04T10:32:19.4131037Z  * [new branch]                gh/ydwu4/330/orig       -> origin/gh/ydwu4/330/orig
2025-12-04T10:32:19.4131127Z  * [new branch]                gh/ydwu4/331/base       -> origin/gh/ydwu4/331/base
2025-12-04T10:32:19.4131195Z  * [new branch]                gh/ydwu4/331/head       -> origin/gh/ydwu4/331/head
2025-12-04T10:32:19.4131260Z  * [new branch]                gh/ydwu4/331/orig       -> origin/gh/ydwu4/331/orig
2025-12-04T10:32:19.4131325Z  * [new branch]                gh/ydwu4/332/base       -> origin/gh/ydwu4/332/base
2025-12-04T10:32:19.4131391Z  * [new branch]                gh/ydwu4/332/head       -> origin/gh/ydwu4/332/head
2025-12-04T10:32:19.4131455Z  * [new branch]                gh/ydwu4/332/orig       -> origin/gh/ydwu4/332/orig
2025-12-04T10:32:19.4131523Z  * [new branch]                gh/ydwu4/333/base       -> origin/gh/ydwu4/333/base
2025-12-04T10:32:19.4131590Z  * [new branch]                gh/ydwu4/333/head       -> origin/gh/ydwu4/333/head
2025-12-04T10:32:19.4131655Z  * [new branch]                gh/ydwu4/333/orig       -> origin/gh/ydwu4/333/orig
2025-12-04T10:32:19.4131722Z  * [new branch]                gh/ydwu4/334/base       -> origin/gh/ydwu4/334/base
2025-12-04T10:32:19.4131789Z  * [new branch]                gh/ydwu4/334/head       -> origin/gh/ydwu4/334/head
2025-12-04T10:32:19.4131853Z  * [new branch]                gh/ydwu4/334/orig       -> origin/gh/ydwu4/334/orig
2025-12-04T10:32:19.4131918Z  * [new branch]                gh/ydwu4/335/base       -> origin/gh/ydwu4/335/base
2025-12-04T10:32:19.4131984Z  * [new branch]                gh/ydwu4/335/head       -> origin/gh/ydwu4/335/head
2025-12-04T10:32:19.4132049Z  * [new branch]                gh/ydwu4/335/orig       -> origin/gh/ydwu4/335/orig
2025-12-04T10:32:19.4132115Z  * [new branch]                gh/ydwu4/337/base       -> origin/gh/ydwu4/337/base
2025-12-04T10:32:19.4132181Z  * [new branch]                gh/ydwu4/337/head       -> origin/gh/ydwu4/337/head
2025-12-04T10:32:19.4132246Z  * [new branch]                gh/ydwu4/337/orig       -> origin/gh/ydwu4/337/orig
2025-12-04T10:32:19.4132312Z  * [new branch]                gh/ydwu4/339/base       -> origin/gh/ydwu4/339/base
2025-12-04T10:32:19.4132378Z  * [new branch]                gh/ydwu4/339/head       -> origin/gh/ydwu4/339/head
2025-12-04T10:32:19.4132443Z  * [new branch]                gh/ydwu4/339/orig       -> origin/gh/ydwu4/339/orig
2025-12-04T10:32:19.4132509Z  * [new branch]                gh/yf225/133/base       -> origin/gh/yf225/133/base
2025-12-04T10:32:19.4132572Z  * [new branch]                gh/yf225/133/head       -> origin/gh/yf225/133/head
2025-12-04T10:32:19.4132636Z  * [new branch]                gh/yf225/93/base        -> origin/gh/yf225/93/base
2025-12-04T10:32:19.4132731Z  * [new branch]                gh/yf225/93/head        -> origin/gh/yf225/93/head
2025-12-04T10:32:19.4132804Z  * [new branch]                gh/yifuwang/152/base    -> origin/gh/yifuwang/152/base
2025-12-04T10:32:19.4132876Z  * [new branch]                gh/yifuwang/152/head    -> origin/gh/yifuwang/152/head
2025-12-04T10:32:19.4132951Z  * [new branch]                gh/yifuwang/152/orig    -> origin/gh/yifuwang/152/orig
2025-12-04T10:32:19.4133021Z  * [new branch]                gh/yifuwang/195/base    -> origin/gh/yifuwang/195/base
2025-12-04T10:32:19.4133092Z  * [new branch]                gh/yifuwang/195/head    -> origin/gh/yifuwang/195/head
2025-12-04T10:32:19.4133166Z  * [new branch]                gh/yifuwang/195/orig    -> origin/gh/yifuwang/195/orig
2025-12-04T10:32:19.4133239Z  * [new branch]                gh/yiming0416/1/base    -> origin/gh/yiming0416/1/base
2025-12-04T10:32:19.4133311Z  * [new branch]                gh/yiming0416/1/head    -> origin/gh/yiming0416/1/head
2025-12-04T10:32:19.4133383Z  * [new branch]                gh/yiming0416/2/base    -> origin/gh/yiming0416/2/base
2025-12-04T10:32:19.4133452Z  * [new branch]                gh/yiming0416/2/head    -> origin/gh/yiming0416/2/head
2025-12-04T10:32:19.4133526Z  * [new branch]                gh/yushangdi/1/base     -> origin/gh/yushangdi/1/base
2025-12-04T10:32:19.4133631Z  * [new branch]                gh/yushangdi/1/head     -> origin/gh/yushangdi/1/head
2025-12-04T10:32:19.4133703Z  * [new branch]                gh/yushangdi/10/base    -> origin/gh/yushangdi/10/base
2025-12-04T10:32:19.4133774Z  * [new branch]                gh/yushangdi/10/head    -> origin/gh/yushangdi/10/head
2025-12-04T10:32:19.4133848Z  * [new branch]                gh/yushangdi/10/orig    -> origin/gh/yushangdi/10/orig
2025-12-04T10:32:19.4133919Z  * [new branch]                gh/yushangdi/11/base    -> origin/gh/yushangdi/11/base
2025-12-04T10:32:19.4133990Z  * [new branch]                gh/yushangdi/11/head    -> origin/gh/yushangdi/11/head
2025-12-04T10:32:19.4134062Z  * [new branch]                gh/yushangdi/11/orig    -> origin/gh/yushangdi/11/orig
2025-12-04T10:32:19.4134134Z  * [new branch]                gh/yushangdi/2/base     -> origin/gh/yushangdi/2/base
2025-12-04T10:32:19.4134206Z  * [new branch]                gh/yushangdi/2/head     -> origin/gh/yushangdi/2/head
2025-12-04T10:32:19.4134278Z  * [new branch]                gh/yushangdi/7/base     -> origin/gh/yushangdi/7/base
2025-12-04T10:32:19.4134347Z  * [new branch]                gh/yushangdi/7/head     -> origin/gh/yushangdi/7/head
2025-12-04T10:32:19.4134418Z  * [new branch]                gh/yushangdi/7/orig     -> origin/gh/yushangdi/7/orig
2025-12-04T10:32:19.4134487Z  * [new branch]                gh/yushangdi/8/base     -> origin/gh/yushangdi/8/base
2025-12-04T10:32:19.4134558Z  * [new branch]                gh/yushangdi/8/head     -> origin/gh/yushangdi/8/head
2025-12-04T10:32:19.4134629Z  * [new branch]                gh/yushangdi/8/orig     -> origin/gh/yushangdi/8/orig
2025-12-04T10:32:19.4134702Z  * [new branch]                gh/yushangdi/9/base     -> origin/gh/yushangdi/9/base
2025-12-04T10:32:19.4134771Z  * [new branch]                gh/yushangdi/9/head     -> origin/gh/yushangdi/9/head
2025-12-04T10:32:19.4134841Z  * [new branch]                gh/yushangdi/9/orig     -> origin/gh/yushangdi/9/orig
2025-12-04T10:32:19.4134909Z  * [new branch]                gh/zklaus/19/base       -> origin/gh/zklaus/19/base
2025-12-04T10:32:19.4134976Z  * [new branch]                gh/zklaus/19/head       -> origin/gh/zklaus/19/head
2025-12-04T10:32:19.4135042Z  * [new branch]                gh/zklaus/19/orig       -> origin/gh/zklaus/19/orig
2025-12-04T10:32:19.4135108Z  * [new branch]                gh/zklaus/20/base       -> origin/gh/zklaus/20/base
2025-12-04T10:32:19.4135173Z  * [new branch]                gh/zklaus/20/head       -> origin/gh/zklaus/20/head
2025-12-04T10:32:19.4135238Z  * [new branch]                gh/zklaus/20/orig       -> origin/gh/zklaus/20/orig
2025-12-04T10:32:19.4135332Z  * [new branch]                gh/zklaus/21/base       -> origin/gh/zklaus/21/base
2025-12-04T10:32:19.4135397Z  * [new branch]                gh/zklaus/21/head       -> origin/gh/zklaus/21/head
2025-12-04T10:32:19.4135462Z  * [new branch]                gh/zklaus/21/orig       -> origin/gh/zklaus/21/orig
2025-12-04T10:32:19.4135529Z  * [new branch]                gh/zklaus/22/base       -> origin/gh/zklaus/22/base
2025-12-04T10:32:19.4135593Z  * [new branch]                gh/zklaus/22/head       -> origin/gh/zklaus/22/head
2025-12-04T10:32:19.4135659Z  * [new branch]                gh/zklaus/22/orig       -> origin/gh/zklaus/22/orig
2025-12-04T10:32:19.4135723Z  * [new branch]                gh/zklaus/23/base       -> origin/gh/zklaus/23/base
2025-12-04T10:32:19.4135790Z  * [new branch]                gh/zklaus/23/head       -> origin/gh/zklaus/23/head
2025-12-04T10:32:19.4135854Z  * [new branch]                gh/zklaus/23/orig       -> origin/gh/zklaus/23/orig
2025-12-04T10:32:19.4135920Z  * [new branch]                gh/zklaus/24/base       -> origin/gh/zklaus/24/base
2025-12-04T10:32:19.4135985Z  * [new branch]                gh/zklaus/24/head       -> origin/gh/zklaus/24/head
2025-12-04T10:32:19.4136049Z  * [new branch]                gh/zklaus/24/orig       -> origin/gh/zklaus/24/orig
2025-12-04T10:32:19.4136147Z  * [new branch]                gh/zou3519/1197/base    -> origin/gh/zou3519/1197/base
2025-12-04T10:32:19.4136217Z  * [new branch]                gh/zou3519/1197/head    -> origin/gh/zou3519/1197/head
2025-12-04T10:32:19.4136289Z  * [new branch]                gh/zou3519/1197/orig    -> origin/gh/zou3519/1197/orig
2025-12-04T10:32:19.4136358Z  * [new branch]                gh/zou3519/1199/base    -> origin/gh/zou3519/1199/base
2025-12-04T10:32:19.4136426Z  * [new branch]                gh/zou3519/1199/head    -> origin/gh/zou3519/1199/head
2025-12-04T10:32:19.4136494Z  * [new branch]                gh/zou3519/1199/orig    -> origin/gh/zou3519/1199/orig
2025-12-04T10:32:19.4136563Z  * [new branch]                gh/zou3519/1200/base    -> origin/gh/zou3519/1200/base
2025-12-04T10:32:19.4136634Z  * [new branch]                gh/zou3519/1200/head    -> origin/gh/zou3519/1200/head
2025-12-04T10:32:19.4136702Z  * [new branch]                gh/zou3519/1200/orig    -> origin/gh/zou3519/1200/orig
2025-12-04T10:32:19.4136771Z  * [new branch]                gh/zou3519/1201/base    -> origin/gh/zou3519/1201/base
2025-12-04T10:32:19.4136839Z  * [new branch]                gh/zou3519/1201/head    -> origin/gh/zou3519/1201/head
2025-12-04T10:32:19.4136907Z  * [new branch]                gh/zou3519/1201/orig    -> origin/gh/zou3519/1201/orig
2025-12-04T10:32:19.4136975Z  * [new branch]                gh/zou3519/1202/base    -> origin/gh/zou3519/1202/base
2025-12-04T10:32:19.4137046Z  * [new branch]                gh/zou3519/1202/head    -> origin/gh/zou3519/1202/head
2025-12-04T10:32:19.4137112Z  * [new branch]                gh/zou3519/1202/orig    -> origin/gh/zou3519/1202/orig
2025-12-04T10:32:19.4137181Z  * [new branch]                gh/zpcore/1/base        -> origin/gh/zpcore/1/base
2025-12-04T10:32:19.4137250Z  * [new branch]                gh/zpcore/1/head        -> origin/gh/zpcore/1/head
2025-12-04T10:32:19.4137316Z  * [new branch]                gh/zpcore/11/base       -> origin/gh/zpcore/11/base
2025-12-04T10:32:19.4137384Z  * [new branch]                gh/zpcore/11/head       -> origin/gh/zpcore/11/head
2025-12-04T10:32:19.4137451Z  * [new branch]                gh/zpcore/11/orig       -> origin/gh/zpcore/11/orig
2025-12-04T10:32:19.4137516Z  * [new branch]                gh/zpcore/12/base       -> origin/gh/zpcore/12/base
2025-12-04T10:32:19.4137583Z  * [new branch]                gh/zpcore/12/head       -> origin/gh/zpcore/12/head
2025-12-04T10:32:19.4137647Z  * [new branch]                gh/zpcore/12/orig       -> origin/gh/zpcore/12/orig
2025-12-04T10:32:19.4137712Z  * [new branch]                gh/zpcore/13/base       -> origin/gh/zpcore/13/base
2025-12-04T10:32:19.4137807Z  * [new branch]                gh/zpcore/13/head       -> origin/gh/zpcore/13/head
2025-12-04T10:32:19.4137873Z  * [new branch]                gh/zpcore/13/orig       -> origin/gh/zpcore/13/orig
2025-12-04T10:32:19.4137937Z  * [new branch]                gh/zpcore/14/base       -> origin/gh/zpcore/14/base
2025-12-04T10:32:19.4138003Z  * [new branch]                gh/zpcore/14/head       -> origin/gh/zpcore/14/head
2025-12-04T10:32:19.4138069Z  * [new branch]                gh/zpcore/14/orig       -> origin/gh/zpcore/14/orig
2025-12-04T10:32:19.4138134Z  * [new branch]                gh/zpcore/15/base       -> origin/gh/zpcore/15/base
2025-12-04T10:32:19.4138200Z  * [new branch]                gh/zpcore/15/head       -> origin/gh/zpcore/15/head
2025-12-04T10:32:19.4138266Z  * [new branch]                gh/zpcore/15/orig       -> origin/gh/zpcore/15/orig
2025-12-04T10:32:19.4138331Z  * [new branch]                gh/zpcore/2/base        -> origin/gh/zpcore/2/base
2025-12-04T10:32:19.4138399Z  * [new branch]                gh/zpcore/2/head        -> origin/gh/zpcore/2/head
2025-12-04T10:32:19.4138464Z  * [new branch]                gh/zpcore/21/base       -> origin/gh/zpcore/21/base
2025-12-04T10:32:19.4138530Z  * [new branch]                gh/zpcore/21/head       -> origin/gh/zpcore/21/head
2025-12-04T10:32:19.4138596Z  * [new branch]                gh/zpcore/21/orig       -> origin/gh/zpcore/21/orig
2025-12-04T10:32:19.4138686Z  * [new branch]                gh/zpcore/22/base       -> origin/gh/zpcore/22/base
2025-12-04T10:32:19.4138751Z  * [new branch]                gh/zpcore/22/head       -> origin/gh/zpcore/22/head
2025-12-04T10:32:19.4138818Z  * [new branch]                gh/zpcore/22/orig       -> origin/gh/zpcore/22/orig
2025-12-04T10:32:19.4138883Z  * [new branch]                gh/zpcore/23/base       -> origin/gh/zpcore/23/base
2025-12-04T10:32:19.4138948Z  * [new branch]                gh/zpcore/23/head       -> origin/gh/zpcore/23/head
2025-12-04T10:32:19.4139015Z  * [new branch]                gh/zpcore/23/orig       -> origin/gh/zpcore/23/orig
2025-12-04T10:32:19.4139080Z  * [new branch]                gh/zpcore/24/base       -> origin/gh/zpcore/24/base
2025-12-04T10:32:19.4139145Z  * [new branch]                gh/zpcore/24/head       -> origin/gh/zpcore/24/head
2025-12-04T10:32:19.4139210Z  * [new branch]                gh/zpcore/24/orig       -> origin/gh/zpcore/24/orig
2025-12-04T10:32:19.4139278Z  * [new branch]                gh/zpcore/25/base       -> origin/gh/zpcore/25/base
2025-12-04T10:32:19.4139344Z  * [new branch]                gh/zpcore/25/head       -> origin/gh/zpcore/25/head
2025-12-04T10:32:19.4139409Z  * [new branch]                gh/zpcore/25/orig       -> origin/gh/zpcore/25/orig
2025-12-04T10:32:19.4139473Z  * [new branch]                gh/zpcore/26/base       -> origin/gh/zpcore/26/base
2025-12-04T10:32:19.4139540Z  * [new branch]                gh/zpcore/26/head       -> origin/gh/zpcore/26/head
2025-12-04T10:32:19.4139629Z  * [new branch]                gh/zpcore/26/orig       -> origin/gh/zpcore/26/orig
2025-12-04T10:32:19.4139696Z  * [new branch]                gh/zpcore/27/base       -> origin/gh/zpcore/27/base
2025-12-04T10:32:19.4139763Z  * [new branch]                gh/zpcore/27/head       -> origin/gh/zpcore/27/head
2025-12-04T10:32:19.4139828Z  * [new branch]                gh/zpcore/27/orig       -> origin/gh/zpcore/27/orig
2025-12-04T10:32:19.4139895Z  * [new branch]                gh/zpcore/28/base       -> origin/gh/zpcore/28/base
2025-12-04T10:32:19.4139962Z  * [new branch]                gh/zpcore/28/head       -> origin/gh/zpcore/28/head
2025-12-04T10:32:19.4140027Z  * [new branch]                gh/zpcore/28/orig       -> origin/gh/zpcore/28/orig
2025-12-04T10:32:19.4140094Z  * [new branch]                gh/zpcore/3/base        -> origin/gh/zpcore/3/base
2025-12-04T10:32:19.4140160Z  * [new branch]                gh/zpcore/3/head        -> origin/gh/zpcore/3/head
2025-12-04T10:32:19.4140225Z  * [new branch]                gh/zpcore/4/base        -> origin/gh/zpcore/4/base
2025-12-04T10:32:19.4140333Z  * [new branch]                gh/zpcore/4/head        -> origin/gh/zpcore/4/head
2025-12-04T10:32:19.4140400Z  * [new branch]                gh/zpcore/5/base        -> origin/gh/zpcore/5/base
2025-12-04T10:32:19.4140464Z  * [new branch]                gh/zpcore/5/head        -> origin/gh/zpcore/5/head
2025-12-04T10:32:19.4140533Z  * [new branch]                gh/zpcore/6/base        -> origin/gh/zpcore/6/base
2025-12-04T10:32:19.4140597Z  * [new branch]                gh/zpcore/6/head        -> origin/gh/zpcore/6/head
2025-12-04T10:32:19.4140662Z  * [new branch]                gh/zpcore/7/base        -> origin/gh/zpcore/7/base
2025-12-04T10:32:19.4140727Z  * [new branch]                gh/zpcore/7/head        -> origin/gh/zpcore/7/head
2025-12-04T10:32:19.4140792Z  * [new branch]                gh/zpcore/8/base        -> origin/gh/zpcore/8/base
2025-12-04T10:32:19.4140857Z  * [new branch]                gh/zpcore/8/head        -> origin/gh/zpcore/8/head
2025-12-04T10:32:19.4140926Z  * [new branch]                google-main             -> origin/google-main
2025-12-04T10:32:19.4141011Z  * [new branch]                guangyey/external_stream -> origin/guangyey/external_stream
2025-12-04T10:32:19.4141082Z  * [new branch]                guangyey/test_2025      -> origin/guangyey/test_2025
2025-12-04T10:32:19.4141264Z  * [new branch]                guilhermeleobas/cherry-pick-55d87d9dfd9 -> origin/guilhermeleobas/cherry-pick-55d87d9dfd9
2025-12-04T10:32:19.4141381Z  * [new branch]                hameerabbasi/complex_tensor_subclass -> origin/hameerabbasi/complex_tensor_subclass
2025-12-04T10:32:19.4141518Z  * [new branch]                hameerabbasi/fix-ctensor-gradcheck-tests -> origin/hameerabbasi/fix-ctensor-gradcheck-tests
2025-12-04T10:32:19.4141626Z  * [new branch]                hameerabbasi/gradcheck-allclose -> origin/hameerabbasi/gradcheck-allclose
2025-12-04T10:32:19.4141689Z  * [new branch]                hc_baseline             -> origin/hc_baseline
2025-12-04T10:32:19.4141753Z  * [new branch]                hhh_rand                -> origin/hhh_rand
2025-12-04T10:32:19.4141815Z  * [new branch]                huba/f1                 -> origin/huba/f1
2025-12-04T10:32:19.4142005Z  * [new branch]                increase-timeout-linux-jammy-cuda12_8-py3_10-gcc11-test -> origin/increase-timeout-linux-jammy-cuda12_8-py3_10-gcc11-test
2025-12-04T10:32:19.4142070Z  * [new branch]                inlining                -> origin/inlining
2025-12-04T10:32:19.4142139Z  * [new branch]                inlining-ezyang         -> origin/inlining-ezyang
2025-12-04T10:32:19.4142221Z  * [new branch]                install-torchao-0.13.0  -> origin/install-torchao-0.13.0
2025-12-04T10:32:19.4142398Z  * [new branch]                instrument-trunk-pull-linux-with-job-test-filters -> origin/instrument-trunk-pull-linux-with-job-test-filters
2025-12-04T10:32:19.4142468Z  * [new branch]                invoke-subgraph         -> origin/invoke-subgraph
2025-12-04T10:32:19.4142534Z  * [new branch]                issue#58739             -> origin/issue#58739
2025-12-04T10:32:19.4142614Z  * [new branch]                jainapurva-patch-1      -> origin/jainapurva-patch-1
2025-12-04T10:32:19.4142674Z  * [new branch]                jathu/o3                -> origin/jathu/o3
2025-12-04T10:32:19.4142733Z  * [new branch]                jathu/sve               -> origin/jathu/sve
2025-12-04T10:32:19.4142859Z  * [new branch]                jcaip/test-cusparselt-version-0.6.2 -> origin/jcaip/test-cusparselt-version-0.6.2
2025-12-04T10:32:19.4142962Z  * [new branch]                jcaip/update-cusparselt-0.6.2 -> origin/jcaip/update-cusparselt-0.6.2
2025-12-04T10:32:19.4143072Z  * [new branch]                jiannanWang/memorysnapshot_filter -> origin/jiannanWang/memorysnapshot_filter
2025-12-04T10:32:19.4143181Z  * [new branch]                jiannanWang/profilerstepwarning -> origin/jiannanWang/profilerstepwarning
2025-12-04T10:32:19.4143267Z  * [new branch]                jithunnair-amd-patch-1  -> origin/jithunnair-amd-patch-1
2025-12-04T10:32:19.4143389Z  * [new branch]                jithunnair-amd-patch-10 -> origin/jithunnair-amd-patch-10
2025-12-04T10:32:19.4143471Z  * [new branch]                jithunnair-amd-patch-2  -> origin/jithunnair-amd-patch-2
2025-12-04T10:32:19.4143551Z  * [new branch]                jithunnair-amd-patch-3  -> origin/jithunnair-amd-patch-3
2025-12-04T10:32:19.4143631Z  * [new branch]                jithunnair-amd-patch-4  -> origin/jithunnair-amd-patch-4
2025-12-04T10:32:19.4143711Z  * [new branch]                jithunnair-amd-patch-5  -> origin/jithunnair-amd-patch-5
2025-12-04T10:32:19.4143790Z  * [new branch]                jithunnair-amd-patch-6  -> origin/jithunnair-amd-patch-6
2025-12-04T10:32:19.4143871Z  * [new branch]                jithunnair-amd-patch-7  -> origin/jithunnair-amd-patch-7
2025-12-04T10:32:19.4143947Z  * [new branch]                jithunnair-amd-patch-8  -> origin/jithunnair-amd-patch-8
2025-12-04T10:32:19.4144027Z  * [new branch]                jithunnair-amd-patch-9  -> origin/jithunnair-amd-patch-9
2025-12-04T10:32:19.4144106Z  * [new branch]                justinchu/native-qdq    -> origin/justinchu/native-qdq
2025-12-04T10:32:19.4144177Z  * [new branch]                kainan666/xlf_debug     -> origin/kainan666/xlf_debug
2025-12-04T10:32:19.4144240Z  * [new branch]                kainan_test             -> origin/kainan_test
2025-12-04T10:32:19.4144352Z  * [new branch]                larryliu0820-patch-1    -> origin/larryliu0820-patch-1
2025-12-04T10:32:19.4144457Z  * [new branch]                leslie/test_group_gemm_epilogues -> origin/leslie/test_group_gemm_epilogues
2025-12-04T10:32:19.4144559Z  * [new branch]                lessw2020/fix_cutlass_cache_error -> origin/lessw2020/fix_cutlass_cache_error
2025-12-04T10:32:19.4144639Z  * [new branch]                liaoxuan/shm_all_reduce -> origin/liaoxuan/shm_all_reduce
2025-12-04T10:32:19.4144738Z  * [new branch]                liaoxuan/test_fa_disable_softmax -> origin/liaoxuan/test_fa_disable_softmax
2025-12-04T10:32:19.4144817Z  * [new branch]                liaoxuan/test_int8_sdpa -> origin/liaoxuan/test_int8_sdpa
2025-12-04T10:32:19.4144887Z  * [new branch]                llama4-stable           -> origin/llama4-stable
2025-12-04T10:32:19.4144953Z  * [new branch]                lts/release/1.8         -> origin/lts/release/1.8
2025-12-04T10:32:19.4145027Z  * [new branch]                lucaskabela/#94773      -> origin/lucaskabela/#94773
2025-12-04T10:32:19.4145105Z  * [new branch]                lucaskabela/fix_164876  -> origin/lucaskabela/fix_164876
2025-12-04T10:32:19.4145188Z  * [new branch]                lucaskabela/flop_counter -> origin/lucaskabela/flop_counter
2025-12-04T10:32:19.4145285Z  * [new branch]                lucaskabela/func_under_decomp -> origin/lucaskabela/func_under_decomp
2025-12-04T10:32:19.4145390Z  * [new branch]                lucaskabela/functional_in_dynamo -> origin/lucaskabela/functional_in_dynamo
2025-12-04T10:32:19.4145515Z  * [new branch]                lucaskabela/install_params_as_graph_attr -> origin/lucaskabela/install_params_as_graph_attr
2025-12-04T10:32:19.4145627Z  * [new branch]                lucaskabela/parameters_as_graph_attr -> origin/lucaskabela/parameters_as_graph_attr
2025-12-04T10:32:19.4145760Z  * [new branch]                lucaskabela/remove_aot_dispatcher_metadata -> origin/lucaskabela/remove_aot_dispatcher_metadata
2025-12-04T10:32:19.4145840Z  * [new branch]                lucaskabela/rnn_decomp  -> origin/lucaskabela/rnn_decomp
2025-12-04T10:32:19.4145932Z  * [new branch]                lucaskabela/typing_backends -> origin/lucaskabela/typing_backends
2025-12-04T10:32:19.4146029Z  * [new branch]                lucaskabela/typing_ctx_manager -> origin/lucaskabela/typing_ctx_manager
2025-12-04T10:32:19.4146123Z  * [new branch]                lucaskabela/typing_nn_module -> origin/lucaskabela/typing_nn_module
2025-12-04T10:32:19.4146225Z  * [new branch]                lucaskabela/typing_user_defined -> origin/lucaskabela/typing_user_defined
2025-12-04T10:32:19.4146345Z  * [new branch]                lucaskabela/typing_variables -> origin/lucaskabela/typing_variables
2025-12-04T10:32:19.4146453Z  * [new branch]                lucaskabela/typing_variables_dicts -> origin/lucaskabela/typing_variables_dicts
2025-12-04T10:32:19.4146574Z  * [new branch]                lucaskabela/typing_variables_functions -> origin/lucaskabela/typing_variables_functions
2025-12-04T10:32:19.4146682Z  * [new branch]                lucaskabela/typing_variables_lists -> origin/lucaskabela/typing_variables_lists
2025-12-04T10:32:19.4146752Z  * [new branch]                lw/torch_box_by_ref     -> origin/lw/torch_box_by_ref
2025-12-04T10:32:19.4146813Z  * [new branch]                main                    -> origin/main
2025-12-04T10:32:19.4146882Z  * [new branch]                malfet-patch-1          -> origin/malfet-patch-1
2025-12-04T10:32:19.4146951Z  * [new branch]                malfet-patch-2          -> origin/malfet-patch-2
2025-12-04T10:32:19.4147018Z  * [new branch]                malfet-patch-3          -> origin/malfet-patch-3
2025-12-04T10:32:19.4147082Z  * [new branch]                malfet-patch-4          -> origin/malfet-patch-4
2025-12-04T10:32:19.4147146Z  * [new branch]                malfet-patch-5          -> origin/malfet-patch-5
2025-12-04T10:32:19.4147211Z  * [new branch]                malfet-patch-6          -> origin/malfet-patch-6
2025-12-04T10:32:19.4147315Z  * [new branch]                malfet-patch-7          -> origin/malfet-patch-7
2025-12-04T10:32:19.4147382Z  * [new branch]                malfet-patch-8          -> origin/malfet-patch-8
2025-12-04T10:32:19.4147456Z  * [new branch]                malfet/add-3.14-ci      -> origin/malfet/add-3.14-ci
2025-12-04T10:32:19.4147616Z  * [new branch]                malfet/be-do-not-make-typos-in-build-artifacts -> origin/malfet/be-do-not-make-typos-in-build-artifacts
2025-12-04T10:32:19.4147783Z  * [new branch]                malfet/be-move-more-settings-to-checkout-pytorch -> origin/malfet/be-move-more-settings-to-checkout-pytorch
2025-12-04T10:32:19.4147911Z  * [new branch]                malfet/be-remove-misisng-neon-headers -> origin/malfet/be-remove-misisng-neon-headers
2025-12-04T10:32:19.4148009Z  * [new branch]                malfet/mps-implement-col2im -> origin/malfet/mps-implement-col2im
2025-12-04T10:32:19.4148128Z  * [new branch]                manuel/aoti_metal_shimify-thread_safe -> origin/manuel/aoti_metal_shimify-thread_safe
2025-12-04T10:32:19.4148218Z  * [new branch]                manuel/inductor_link_openmp -> origin/manuel/inductor_link_openmp
2025-12-04T10:32:19.4148292Z  * [new branch]                masnesral/metaconda     -> origin/masnesral/metaconda
2025-12-04T10:32:19.4148369Z  * [new branch]                mem_profiler_flaky_fix  -> origin/mem_profiler_flaky_fix
2025-12-04T10:32:19.4148448Z  * [new branch]                mem_profiler_stack_trace -> origin/mem_profiler_stack_trace
2025-12-04T10:32:19.4148524Z  * [new branch]                memory_profiler_stack   -> origin/memory_profiler_stack
2025-12-04T10:32:19.4148599Z  * [new branch]                metascroy-patch-1       -> origin/metascroy-patch-1
2025-12-04T10:32:19.4148662Z  * [new branch]                mingw_posix             -> origin/mingw_posix
2025-12-04T10:32:19.4148736Z  * [new branch]                mlazos/S429861-debug    -> origin/mlazos/S429861-debug
2025-12-04T10:32:19.4148797Z  * [new branch]                mlazos/aa               -> origin/mlazos/aa
2025-12-04T10:32:19.4148859Z  * [new branch]                mlazos/acts             -> origin/mlazos/acts
2025-12-04T10:32:19.4148932Z  * [new branch]                mlazos/arg-renames      -> origin/mlazos/arg-renames
2025-12-04T10:32:19.4149009Z  * [new branch]                mlazos/bad-cudagraphs   -> origin/mlazos/bad-cudagraphs
2025-12-04T10:32:19.4149108Z  * [new branch]                mlazos/baseline-graph-breaks -> origin/mlazos/baseline-graph-breaks
2025-12-04T10:32:19.4149181Z  * [new branch]                mlazos/beta-tensor      -> origin/mlazos/beta-tensor
2025-12-04T10:32:19.4149271Z  * [new branch]                mlazos/buffers          -> origin/mlazos/buffers
2025-12-04T10:32:19.4149337Z  * [new branch]                mlazos/buffers2         -> origin/mlazos/buffers2
2025-12-04T10:32:19.4149403Z  * [new branch]                mlazos/buffers3         -> origin/mlazos/buffers3
2025-12-04T10:32:19.4149465Z  * [new branch]                mlazos/bwd              -> origin/mlazos/bwd
2025-12-04T10:32:19.4149534Z  * [new branch]                mlazos/combo-test       -> origin/mlazos/combo-test
2025-12-04T10:32:19.4149644Z  * [new branch]                mlazos/ctx-cleanup      -> origin/mlazos/ctx-cleanup
2025-12-04T10:32:19.4149719Z  * [new branch]                mlazos/cuda-cmd-log     -> origin/mlazos/cuda-cmd-log
2025-12-04T10:32:19.4149798Z  * [new branch]                mlazos/cudagraph-tests  -> origin/mlazos/cudagraph-tests
2025-12-04T10:32:19.4149900Z  * [new branch]                mlazos/cudagraphs-measurement -> origin/mlazos/cudagraphs-measurement
2025-12-04T10:32:19.4149975Z  * [new branch]                mlazos/cutlass-test     -> origin/mlazos/cutlass-test
2025-12-04T10:32:19.4150055Z  * [new branch]                mlazos/cutlass-topo-bug -> origin/mlazos/cutlass-topo-bug
2025-12-04T10:32:19.4150137Z  * [new branch]                mlazos/dataclass-proxy  -> origin/mlazos/dataclass-proxy
2025-12-04T10:32:19.4150247Z  * [new branch]                mlazos/dc-attrs         -> origin/mlazos/dc-attrs
2025-12-04T10:32:19.4150319Z  * [new branch]                mlazos/dc-helion        -> origin/mlazos/dc-helion
2025-12-04T10:32:19.4150387Z  * [new branch]                mlazos/dict-fix         -> origin/mlazos/dict-fix
2025-12-04T10:32:19.4150456Z  * [new branch]                mlazos/disable-tf       -> origin/mlazos/disable-tf
2025-12-04T10:32:19.4150524Z  * [new branch]                mlazos/dupe-fix         -> origin/mlazos/dupe-fix
2025-12-04T10:32:19.4150592Z  * [new branch]                mlazos/dyn-batch        -> origin/mlazos/dyn-batch
2025-12-04T10:32:19.4150656Z  * [new branch]                mlazos/evt              -> origin/mlazos/evt
2025-12-04T10:32:19.4150737Z  * [new branch]                mlazos/extract-examples -> origin/mlazos/extract-examples
2025-12-04T10:32:19.4150807Z  * [new branch]                mlazos/foreach-op       -> origin/mlazos/foreach-op
2025-12-04T10:32:19.4150869Z  * [new branch]                mlazos/fp8              -> origin/mlazos/fp8
2025-12-04T10:32:19.4150936Z  * [new branch]                mlazos/fp8-bias         -> origin/mlazos/fp8-bias
2025-12-04T10:32:19.4151014Z  * [new branch]                mlazos/fp8-bias-fusion  -> origin/mlazos/fp8-bias-fusion
2025-12-04T10:32:19.4151081Z  * [new branch]                mlazos/fp8-fixes        -> origin/mlazos/fp8-fixes
2025-12-04T10:32:19.4151148Z  * [new branch]                mlazos/freezing         -> origin/mlazos/freezing
2025-12-04T10:32:19.4151214Z  * [new branch]                mlazos/h-comp           -> origin/mlazos/h-comp
2025-12-04T10:32:19.4151281Z  * [new branch]                mlazos/h-comp2          -> origin/mlazos/h-comp2
2025-12-04T10:32:19.4151349Z  * [new branch]                mlazos/hash-hop         -> origin/mlazos/hash-hop
2025-12-04T10:32:19.4151410Z  * [new branch]                mlazos/hc               -> origin/mlazos/hc
2025-12-04T10:32:19.4151476Z  * [new branch]                mlazos/hc-cycles        -> origin/mlazos/hc-cycles
2025-12-04T10:32:19.4151546Z  * [new branch]                mlazos/hc-fixes         -> origin/mlazos/hc-fixes
2025-12-04T10:32:19.4151614Z  * [new branch]                mlazos/hc-fixes3        -> origin/mlazos/hc-fixes3
2025-12-04T10:32:19.4151681Z  * [new branch]                mlazos/hc-fixes4        -> origin/mlazos/hc-fixes4
2025-12-04T10:32:19.4151745Z  * [new branch]                mlazos/hc-hf            -> origin/mlazos/hc-hf
2025-12-04T10:32:19.4151809Z  * [new branch]                mlazos/hc-mut           -> origin/mlazos/hc-mut
2025-12-04T10:32:19.4151909Z  * [new branch]                mlazos/hc10             -> origin/mlazos/hc10
2025-12-04T10:32:19.4151973Z  * [new branch]                mlazos/hc11             -> origin/mlazos/hc11
2025-12-04T10:32:19.4152032Z  * [new branch]                mlazos/hc12             -> origin/mlazos/hc12
2025-12-04T10:32:19.4152093Z  * [new branch]                mlazos/hc13             -> origin/mlazos/hc13
2025-12-04T10:32:19.4152155Z  * [new branch]                mlazos/hc14             -> origin/mlazos/hc14
2025-12-04T10:32:19.4152215Z  * [new branch]                mlazos/hc15             -> origin/mlazos/hc15
2025-12-04T10:32:19.4152275Z  * [new branch]                mlazos/hc2              -> origin/mlazos/hc2
2025-12-04T10:32:19.4152336Z  * [new branch]                mlazos/hc4              -> origin/mlazos/hc4
2025-12-04T10:32:19.4152395Z  * [new branch]                mlazos/hc5              -> origin/mlazos/hc5
2025-12-04T10:32:19.4152456Z  * [new branch]                mlazos/hc6              -> origin/mlazos/hc6
2025-12-04T10:32:19.4152516Z  * [new branch]                mlazos/hc7              -> origin/mlazos/hc7
2025-12-04T10:32:19.4152573Z  * [new branch]                mlazos/hc8              -> origin/mlazos/hc8
2025-12-04T10:32:19.4152633Z  * [new branch]                mlazos/hc9              -> origin/mlazos/hc9
2025-12-04T10:32:19.4152704Z  * [new branch]                mlazos/hc_baseline2     -> origin/mlazos/hc_baseline2
2025-12-04T10:32:19.4152827Z  * [new branch]                mlazos/inductor-streams -> origin/mlazos/inductor-streams
2025-12-04T10:32:19.4152890Z  * [new branch]                mlazos/main             -> origin/mlazos/main
2025-12-04T10:32:19.4152950Z  * [new branch]                mlazos/mcg2             -> origin/mlazos/mcg2
2025-12-04T10:32:19.4153023Z  * [new branch]                mlazos/meta-guards      -> origin/mlazos/meta-guards
2025-12-04T10:32:19.4153129Z  * [new branch]                mlazos/mlazos/foreach-map-adam -> origin/mlazos/mlazos/foreach-map-adam
2025-12-04T10:32:19.4153226Z  * [new branch]                mlazos/mlazos/tf-mode-backup -> origin/mlazos/mlazos/tf-mode-backup
2025-12-04T10:32:19.4153292Z  * [new branch]                mlazos/mod-fix          -> origin/mlazos/mod-fix
2025-12-04T10:32:19.4153361Z  * [new branch]                mlazos/mode-fix         -> origin/mlazos/mode-fix
2025-12-04T10:32:19.4153425Z  * [new branch]                mlazos/offsets          -> origin/mlazos/offsets
2025-12-04T10:32:19.4153499Z  * [new branch]                mlazos/overguarding     -> origin/mlazos/overguarding
2025-12-04T10:32:19.4153575Z  * [new branch]                mlazos/proxy-ctors      -> origin/mlazos/proxy-ctors
2025-12-04T10:32:19.4153643Z  * [new branch]                mlazos/quant-fix        -> origin/mlazos/quant-fix
2025-12-04T10:32:19.4153716Z  * [new branch]                mlazos/resnet-fix       -> origin/mlazos/resnet-fix
2025-12-04T10:32:19.4153789Z  * [new branch]                mlazos/rm-buf-names     -> origin/mlazos/rm-buf-names
2025-12-04T10:32:19.4153855Z  * [new branch]                mlazos/rm-code          -> origin/mlazos/rm-code
2025-12-04T10:32:19.4153919Z  * [new branch]                mlazos/rm-spam          -> origin/mlazos/rm-spam
2025-12-04T10:32:19.4153983Z  * [new branch]                mlazos/rtp              -> origin/mlazos/rtp
2025-12-04T10:32:19.4154065Z  * [new branch]                mlazos/static-idx-dbg   -> origin/mlazos/static-idx-dbg
2025-12-04T10:32:19.4154153Z  * [new branch]                mlazos/static-inputs-log -> origin/mlazos/static-inputs-log
2025-12-04T10:32:19.4154216Z  * [new branch]                mlazos/stests           -> origin/mlazos/stests
2025-12-04T10:32:19.4154284Z  * [new branch]                mlazos/stream-ops       -> origin/mlazos/stream-ops
2025-12-04T10:32:19.4154350Z  * [new branch]                mlazos/td-fix2          -> origin/mlazos/td-fix2
2025-12-04T10:32:19.4154427Z  * [new branch]                mlazos/tensor-hasattr2  -> origin/mlazos/tensor-hasattr2
2025-12-04T10:32:19.4154488Z  * [new branch]                mlazos/test             -> origin/mlazos/test
2025-12-04T10:32:19.4154582Z  * [new branch]                mlazos/tf-mode          -> origin/mlazos/tf-mode
2025-12-04T10:32:19.4154660Z  * [new branch]                mlazos/tf-mode-backup2  -> origin/mlazos/tf-mode-backup2
2025-12-04T10:32:19.4154735Z  * [new branch]                mlazos/tf-mode-reland   -> origin/mlazos/tf-mode-reland
2025-12-04T10:32:19.4154814Z  * [new branch]                mlazos/tf-mode-reland2  -> origin/mlazos/tf-mode-reland2
2025-12-04T10:32:19.4154889Z  * [new branch]                mlazos/tf-mode-reland3  -> origin/mlazos/tf-mode-reland3
2025-12-04T10:32:19.4154965Z  * [new branch]                mlazos/triton-no-epi    -> origin/mlazos/triton-no-epi
2025-12-04T10:32:19.4155039Z  * [new branch]                mlazos/tune-proto       -> origin/mlazos/tune-proto
2025-12-04T10:32:19.4155111Z  * [new branch]                mlazos/tuple-fixes      -> origin/mlazos/tuple-fixes
2025-12-04T10:32:19.4155183Z  * [new branch]                mlazos/tuple-fixes2     -> origin/mlazos/tuple-fixes2
2025-12-04T10:32:19.4155262Z  * [new branch]                mlazos/tuple-handling   -> origin/mlazos/tuple-handling
2025-12-04T10:32:19.4155343Z  * [new branch]                mlazos/user-stream-base -> origin/mlazos/user-stream-base
2025-12-04T10:32:19.4155415Z  * [new branch]                mlazos/user-streams     -> origin/mlazos/user-streams
2025-12-04T10:32:19.4155538Z  * [new branch]                mlazos/user-streams-backup -> origin/mlazos/user-streams-backup
2025-12-04T10:32:19.4155633Z  * [new branch]                mlazos/user-streams-backup2 -> origin/mlazos/user-streams-backup2
2025-12-04T10:32:19.4155704Z  * [new branch]                mlazos/vary-beta        -> origin/mlazos/vary-beta
2025-12-04T10:32:19.4155773Z  * [new branch]                mlazos/vary-beta2       -> origin/mlazos/vary-beta2
2025-12-04T10:32:19.4155846Z  * [new branch]                mlazos/weird-perf1      -> origin/mlazos/weird-perf1
2025-12-04T10:32:19.4155918Z  * [new branch]                mm_out_dtype_compile    -> origin/mm_out_dtype_compile
2025-12-04T10:32:19.4155982Z  * [new branch]                module-shim             -> origin/module-shim
2025-12-04T10:32:19.4156043Z  * [new branch]                move_config             -> origin/move_config
2025-12-04T10:32:19.4156113Z  * [new branch]                msaroufim/reduce        -> origin/msaroufim/reduce
2025-12-04T10:32:19.4156184Z  * [new branch]                mtia/basic-cmake        -> origin/mtia/basic-cmake
2025-12-04T10:32:19.4156285Z  * [new branch]                mwizak/fix-triton-block-shape -> origin/mwizak/fix-triton-block-shape
2025-12-04T10:32:19.4156353Z  * [new branch]                my_varlen_backup        -> origin/my_varlen_backup
2025-12-04T10:32:19.4156427Z  * [new branch]                nativert_num_outputs    -> origin/nativert_num_outputs
2025-12-04T10:32:19.4156489Z  * [new branch]                new-codegen             -> origin/new-codegen
2025-12-04T10:32:19.4156557Z  * [new branch]                newtest-base            -> origin/newtest-base
2025-12-04T10:32:19.4156628Z  * [new branch]                ngimel/addmm_dtype      -> origin/ngimel/addmm_dtype
2025-12-04T10:32:19.4156692Z  * [new branch]                ngimel/div_inv          -> origin/ngimel/div_inv
2025-12-04T10:32:19.4156770Z  * [new branch]                ngimel/error_index_list -> origin/ngimel/error_index_list
2025-12-04T10:32:19.4156841Z  * [new branch]                ngimel/gather_grid      -> origin/ngimel/gather_grid
2025-12-04T10:32:19.4156928Z  * [new branch]                ngimel/gather_grid_release -> origin/ngimel/gather_grid_release
2025-12-04T10:32:19.4156992Z  * [new branch]                ngimel/gg_new           -> origin/ngimel/gg_new
2025-12-04T10:32:19.4157059Z  * [new branch]                ngimel/hostalloc        -> origin/ngimel/hostalloc
2025-12-04T10:32:19.4157128Z  * [new branch]                ngimel/storage_id       -> origin/ngimel/storage_id
2025-12-04T10:32:19.4157193Z  * [new branch]                nightly                 -> origin/nightly
2025-12-04T10:32:19.4157341Z  * [new branch]                nikitaved/addmm_1_rowcol_lt_path_check -> origin/nikitaved/addmm_1_rowcol_lt_path_check
2025-12-04T10:32:19.4157463Z  * [new branch]                nikitaved/addmm_epilogue_fusions_2d_bias -> origin/nikitaved/addmm_epilogue_fusions_2d_bias
2025-12-04T10:32:19.4157589Z  * [new branch]                nikitaved/addmm_epilogue_fusions_inductor -> origin/nikitaved/addmm_epilogue_fusions_inductor
2025-12-04T10:32:19.4157710Z  * [new branch]                nikitaved/addmm_epilogue_fusions_scratch -> origin/nikitaved/addmm_epilogue_fusions_scratch
2025-12-04T10:32:19.4157826Z  * [new branch]                nikitaved/grad_addmm_epilogue_fusions -> origin/nikitaved/grad_addmm_epilogue_fusions
2025-12-04T10:32:19.4157935Z  * [new branch]                nikitaved/simpler_can_use_32bit_index -> origin/nikitaved/simpler_can_use_32bit_index
2025-12-04T10:32:19.4158002Z  * [new branch]                nikitaved/test          -> origin/nikitaved/test
2025-12-04T10:32:19.4158129Z  * [new branch]                nmacchioni-perf-test-async-autotune -> origin/nmacchioni-perf-test-async-autotune
2025-12-04T10:32:19.4158205Z  * [new branch]                no_distributed_log_spew -> origin/no_distributed_log_spew
2025-12-04T10:32:19.4158268Z  * [new branch]                nofun-hack              -> origin/nofun-hack
2025-12-04T10:32:19.4158359Z  * [new branch]                norm_bench              -> origin/norm_bench
2025-12-04T10:32:19.4158434Z  * [new branch]                nullplay/fuse_matmul    -> origin/nullplay/fuse_matmul
2025-12-04T10:32:19.4158507Z  * [new branch]                nullplay_fuse_matmul    -> origin/nullplay_fuse_matmul
2025-12-04T10:32:19.4158575Z  * [new branch]                optimizer_test          -> origin/optimizer_test
2025-12-04T10:32:19.4158643Z  * [new branch]                orig/release/1.10       -> origin/orig/release/1.10
2025-12-04T10:32:19.4158710Z  * [new branch]                orig/release/1.11       -> origin/orig/release/1.11
2025-12-04T10:32:19.4158779Z  * [new branch]                orig/release/1.12       -> origin/orig/release/1.12
2025-12-04T10:32:19.4158845Z  * [new branch]                orig/release/1.13       -> origin/orig/release/1.13
2025-12-04T10:32:19.4158913Z  * [new branch]                orig/release/1.6        -> origin/orig/release/1.6
2025-12-04T10:32:19.4158979Z  * [new branch]                orig/release/1.7        -> origin/orig/release/1.7
2025-12-04T10:32:19.4159043Z  * [new branch]                orig/release/1.8        -> origin/orig/release/1.8
2025-12-04T10:32:19.4159110Z  * [new branch]                orig/release/1.9        -> origin/orig/release/1.9
2025-12-04T10:32:19.4159174Z  * [new branch]                orig/release/2.0        -> origin/orig/release/2.0
2025-12-04T10:32:19.4159240Z  * [new branch]                orig/release/2.1        -> origin/orig/release/2.1
2025-12-04T10:32:19.4159308Z  * [new branch]                orig/release/2.2        -> origin/orig/release/2.2
2025-12-04T10:32:19.4159373Z  * [new branch]                orig/release/2.3        -> origin/orig/release/2.3
2025-12-04T10:32:19.4159438Z  * [new branch]                orig/release/2.4        -> origin/orig/release/2.4
2025-12-04T10:32:19.4159505Z  * [new branch]                orig/release/2.5        -> origin/orig/release/2.5
2025-12-04T10:32:19.4159614Z  * [new branch]                orig/release/2.6        -> origin/orig/release/2.6
2025-12-04T10:32:19.4159679Z  * [new branch]                orig/release/2.7        -> origin/orig/release/2.7
2025-12-04T10:32:19.4159744Z  * [new branch]                orig/release/2.8        -> origin/orig/release/2.8
2025-12-04T10:32:19.4159808Z  * [new branch]                orig/release/2.9        -> origin/orig/release/2.9
2025-12-04T10:32:19.4159893Z  * [new branch]                origin/gh/fxdawnn/1/base -> origin/origin/gh/fxdawnn/1/base
2025-12-04T10:32:19.4159975Z  * [new branch]                origin/gh/fxdawnn/1/orig -> origin/origin/gh/fxdawnn/1/orig
2025-12-04T10:32:19.4160103Z  * [new branch]                origin/gh/zpcore/14/orig -> origin/origin/gh/zpcore/14/orig
2025-12-04T10:32:19.4160170Z  * [new branch]                oulgen-patch-1          -> origin/oulgen-patch-1
2025-12-04T10:32:19.4160238Z  * [new branch]                oulgen-patch-2          -> origin/oulgen-patch-2
2025-12-04T10:32:19.4160305Z  * [new branch]                oulgen-patch-3          -> origin/oulgen-patch-3
2025-12-04T10:32:19.4160371Z  * [new branch]                oulgen-patch-4          -> origin/oulgen-patch-4
2025-12-04T10:32:19.4160439Z  * [new branch]                padded-tensor           -> origin/padded-tensor
2025-12-04T10:32:19.4160501Z  * [new branch]                pca2                    -> origin/pca2
2025-12-04T10:32:19.4160573Z  * [new branch]                per_channel_backup      -> origin/per_channel_backup
2025-12-04T10:32:19.4160636Z  * [new branch]                perf_ops                -> origin/perf_ops
2025-12-04T10:32:19.4160701Z  * [new branch]                perf_ops_2_9            -> origin/perf_ops_2_9
2025-12-04T10:32:19.4160772Z  * [new branch]                pianpwk-patch-1         -> origin/pianpwk-patch-1
2025-12-04T10:32:19.4160857Z  * [new branch]                pianpwk/__draft_debug_mode -> origin/pianpwk/__draft_debug_mode
2025-12-04T10:32:19.4161009Z  * [new branch]                pianpwk/_debug_mode_for_triton_draft -> origin/pianpwk/_debug_mode_for_triton_draft
2025-12-04T10:32:19.4161111Z  * [new branch]                pianpwk/_debug_nn_module_compile -> origin/pianpwk/_debug_nn_module_compile
2025-12-04T10:32:19.4161195Z  * [new branch]                pianpwk/_draft_triton_11_3 -> origin/pianpwk/_draft_triton_11_3
2025-12-04T10:32:19.4161285Z  * [new branch]                pianpwk/_manual_bucket_draft -> origin/pianpwk/_manual_bucket_draft
2025-12-04T10:32:19.4161388Z  * [new branch]                pianpwk/_profile_w_dispatch_keys -> origin/pianpwk/_profile_w_dispatch_keys
2025-12-04T10:32:19.4161486Z  * [new branch]                pianpwk/_super_draft_debug_mode -> origin/pianpwk/_super_draft_debug_mode
2025-12-04T10:32:19.4161589Z  * [new branch]                pianpwk/_unbacked_local_shard_size -> origin/pianpwk/_unbacked_local_shard_size
2025-12-04T10:32:19.4161663Z  * [new branch]                pianpwk/anomaly_tb      -> origin/pianpwk/anomaly_tb
2025-12-04T10:32:19.4161745Z  * [new branch]                pianpwk/auto_fx_annotate -> origin/pianpwk/auto_fx_annotate
2025-12-04T10:32:19.4161857Z  * [new branch]                pianpwk/backed_size_oblivious_export -> origin/pianpwk/backed_size_oblivious_export
2025-12-04T10:32:19.4161942Z  * [new branch]                pianpwk/bert_dynamic_perf -> origin/pianpwk/bert_dynamic_perf
2025-12-04T10:32:19.4162038Z  * [new branch]                pianpwk/debug_fwd_stack_traces -> origin/pianpwk/debug_fwd_stack_traces
2025-12-04T10:32:19.4162123Z  * [new branch]                pianpwk/debug_hash_tensor -> origin/pianpwk/debug_hash_tensor
2025-12-04T10:32:19.4162214Z  * [new branch]                pianpwk/debug_mode_annotate -> origin/pianpwk/debug_mode_annotate
2025-12-04T10:32:19.4162301Z  * [new branch]                pianpwk/debug_mode_defaults -> origin/pianpwk/debug_mode_defaults
2025-12-04T10:32:19.4162381Z  * [new branch]                pianpwk/debug_mode_hacks -> origin/pianpwk/debug_mode_hacks
2025-12-04T10:32:19.4162487Z  * [new branch]                pianpwk/debug_mode_opcall_refactor -> origin/pianpwk/debug_mode_opcall_refactor
2025-12-04T10:32:19.4162572Z  * [new branch]                pianpwk/debug_mode_show_ids -> origin/pianpwk/debug_mode_show_ids
2025-12-04T10:32:19.4162655Z  * [new branch]                pianpwk/debug_mode_triton -> origin/pianpwk/debug_mode_triton
2025-12-04T10:32:19.4162748Z  * [new branch]                pianpwk/debug_show_stack_trace -> origin/pianpwk/debug_show_stack_trace
2025-12-04T10:32:19.4162846Z  * [new branch]                pianpwk/debug_wait_on_collective -> origin/pianpwk/debug_wait_on_collective
2025-12-04T10:32:19.4162971Z  * [new branch]                pianpwk/debugmode_compile_tf -> origin/pianpwk/debugmode_compile_tf
2025-12-04T10:32:19.4163096Z  * [new branch]                pianpwk/dispatch_key_debugging_for_debug -> origin/pianpwk/dispatch_key_debugging_for_debug
2025-12-04T10:32:19.4163200Z  * [new branch]                pianpwk/draft_debug_mode_tfcompile -> origin/pianpwk/draft_debug_mode_tfcompile
2025-12-04T10:32:19.4163297Z  * [new branch]                pianpwk/draft_multikernel_nn -> origin/pianpwk/draft_multikernel_nn
2025-12-04T10:32:19.4163410Z  * [new branch]                pianpwk/draft_multikernel_status_10_5 -> origin/pianpwk/draft_multikernel_status_10_5
2025-12-04T10:32:19.4163502Z  * [new branch]                pianpwk/dtensor_custom_chunk -> origin/pianpwk/dtensor_custom_chunk
2025-12-04T10:32:19.4163605Z  * [new branch]                pianpwk/dtensor_unbacked_keypath -> origin/pianpwk/dtensor_unbacked_keypath
2025-12-04T10:32:19.4163686Z  * [new branch]                pianpwk/event_list_tree -> origin/pianpwk/event_list_tree
2025-12-04T10:32:19.4163771Z  * [new branch]                pianpwk/false_numel_refs -> origin/pianpwk/false_numel_refs
2025-12-04T10:32:19.4163847Z  * [new branch]                pianpwk/maybe_guard_rel -> origin/pianpwk/maybe_guard_rel
2025-12-04T10:32:19.4163948Z  * [new branch]                pianpwk/multikernel_hints_draft -> origin/pianpwk/multikernel_hints_draft
2025-12-04T10:32:19.4164098Z  * [new branch]                pianpwk/no_size_oblivious_slice_scat -> origin/pianpwk/no_size_oblivious_slice_scat
2025-12-04T10:32:19.4164211Z  * [new branch]                pianpwk/oblivious_reshape_view_better -> origin/pianpwk/oblivious_reshape_view_better
2025-12-04T10:32:19.4164292Z  * [new branch]                pianpwk/pre_forward_hook -> origin/pianpwk/pre_forward_hook
2025-12-04T10:32:19.4164399Z  * [new branch]                pianpwk/skip_python_keys_alternate -> origin/pianpwk/skip_python_keys_alternate
2025-12-04T10:32:19.4164502Z  * [new branch]                pianpwk/skip_python_keys_in_guards -> origin/pianpwk/skip_python_keys_in_guards
2025-12-04T10:32:19.4164585Z  * [new branch]                pianpwk/sym_tokens_draft -> origin/pianpwk/sym_tokens_draft
2025-12-04T10:32:19.4164665Z  * [new branch]                pianpwk/symint_one_hot  -> origin/pianpwk/symint_one_hot
2025-12-04T10:32:19.4164777Z  * [new branch]                pianpwk/test_pointwise_guard_or_false -> origin/pianpwk/test_pointwise_guard_or_false
2025-12-04T10:32:19.4164875Z  * [new branch]                pianpwk/totally_draft_sym_wrap -> origin/pianpwk/totally_draft_sym_wrap
2025-12-04T10:32:19.4164961Z  * [new branch]                pianpwk/try_dumb_stuff  -> origin/pianpwk/try_dumb_stuff
2025-12-04T10:32:19.4165040Z  * [new branch]                pianpwk/try_dumb_stuff_2 -> origin/pianpwk/try_dumb_stuff_2
2025-12-04T10:32:19.4165133Z  * [new branch]                pianpwk/unbacked_dtensor_mm -> origin/pianpwk/unbacked_dtensor_mm
2025-12-04T10:32:19.4165227Z  * [new branch]                pianpwk/unbacked_tracing_12_2 -> origin/pianpwk/unbacked_tracing_12_2
2025-12-04T10:32:19.4165305Z  * [new branch]                pianpwk/user_symints    -> origin/pianpwk/user_symints
2025-12-04T10:32:19.4165385Z  * [new branch]                pianpwk/wan21_reshape   -> origin/pianpwk/wan21_reshape
2025-12-04T10:32:19.4165477Z  * [new branch]                piz/fix_partial_backward_1112 -> origin/piz/fix_partial_backward_1112
2025-12-04T10:32:19.4165551Z  * [new branch]                piz/prop_cache_clean    -> origin/piz/prop_cache_clean
2025-12-04T10:32:19.4165622Z  * [new branch]                pool-separate           -> origin/pool-separate
2025-12-04T10:32:19.4165682Z  * [new branch]                pr-156087               -> origin/pr-156087
2025-12-04T10:32:19.4165743Z  * [new branch]                pr/131860               -> origin/pr/131860
2025-12-04T10:32:19.4165811Z  * [new branch]                predispatch_to          -> origin/predispatch_to
2025-12-04T10:32:19.4165875Z  * [new branch]                protect-c17             -> origin/protect-c17
2025-12-04T10:32:19.4165969Z  * [new branch]                pt-opt-cuda3            -> origin/pt-opt-cuda3
2025-12-04T10:32:19.4166052Z  * [new branch]                python_compiled_autograd -> origin/python_compiled_autograd
2025-12-04T10:32:19.4166181Z  * [new branch]                q1l1/fix_device_moved_constant_type_unknown -> origin/q1l1/fix_device_moved_constant_type_unknown
2025-12-04T10:32:19.4166320Z  * [new branch]                q1l1/fix_wrong_default_type_for_kernel_call_args -> origin/q1l1/fix_wrong_default_type_for_kernel_call_args
2025-12-04T10:32:19.4166400Z  * [new branch]                qchip/export-D54134695  -> origin/qchip/export-D54134695
2025-12-04T10:32:19.4166472Z  * [new branch]                quote-pytest_cache      -> origin/quote-pytest_cache
2025-12-04T10:32:19.4166569Z  * [new branch]                reland-accgrad-stream-warn -> origin/reland-accgrad-stream-warn
2025-12-04T10:32:19.4166633Z  * [new branch]                release/1.10            -> origin/release/1.10
2025-12-04T10:32:19.4166698Z  * [new branch]                release/1.11            -> origin/release/1.11
2025-12-04T10:32:19.4166760Z  * [new branch]                release/1.12            -> origin/release/1.12
2025-12-04T10:32:19.4166822Z  * [new branch]                release/1.13            -> origin/release/1.13
2025-12-04T10:32:19.4166882Z  * [new branch]                release/1.4             -> origin/release/1.4
2025-12-04T10:32:19.4166973Z  * [new branch]                release/1.4.1           -> origin/release/1.4.1
2025-12-04T10:32:19.4167034Z  * [new branch]                release/1.5             -> origin/release/1.5
2025-12-04T10:32:19.4167094Z  * [new branch]                release/1.6             -> origin/release/1.6
2025-12-04T10:32:19.4167153Z  * [new branch]                release/1.7             -> origin/release/1.7
2025-12-04T10:32:19.4167212Z  * [new branch]                release/1.8             -> origin/release/1.8
2025-12-04T10:32:19.4167271Z  * [new branch]                release/1.9             -> origin/release/1.9
2025-12-04T10:32:19.4167334Z  * [new branch]                release/2.0             -> origin/release/2.0
2025-12-04T10:32:19.4167393Z  * [new branch]                release/2.1             -> origin/release/2.1
2025-12-04T10:32:19.4167451Z  * [new branch]                release/2.2             -> origin/release/2.2
2025-12-04T10:32:19.4167512Z  * [new branch]                release/2.3             -> origin/release/2.3
2025-12-04T10:32:19.4167572Z  * [new branch]                release/2.4             -> origin/release/2.4
2025-12-04T10:32:19.4167632Z  * [new branch]                release/2.5             -> origin/release/2.5
2025-12-04T10:32:19.4167692Z  * [new branch]                release/2.6             -> origin/release/2.6
2025-12-04T10:32:19.4167751Z  * [new branch]                release/2.7             -> origin/release/2.7
2025-12-04T10:32:19.4167811Z  * [new branch]                release/2.8             -> origin/release/2.8
2025-12-04T10:32:19.4167873Z  * [new branch]                release/2.9             -> origin/release/2.9
2025-12-04T10:32:19.4167935Z  * [new branch]                release_notes           -> origin/release_notes
2025-12-04T10:32:19.4168009Z  * [new branch]                remove_pyinterpreter    -> origin/remove_pyinterpreter
2025-12-04T10:32:19.4168133Z  * [new branch]                replace-pytorch-labs-20250812-195836 -> origin/replace-pytorch-labs-20250812-195836
2025-12-04T10:32:19.4168255Z  * [new branch]                replace-pytorch-labs-20250812-200248 -> origin/replace-pytorch-labs-20250812-200248
2025-12-04T10:32:19.4168373Z  * [new branch]                replace-pytorch-labs-20250812-200324 -> origin/replace-pytorch-labs-20250812-200324
2025-12-04T10:32:19.4168489Z  * [new branch]                replace-pytorch-labs-20250812-204020 -> origin/replace-pytorch-labs-20250812-204020
2025-12-04T10:32:19.4168617Z  * [new branch]                revert-131069-gh/krzysztofjordan/1/head -> origin/revert-131069-gh/krzysztofjordan/1/head
2025-12-04T10:32:19.4168758Z  * [new branch]                revert-131469-gh/andrewor14/51/head -> origin/revert-131469-gh/andrewor14/51/head
2025-12-04T10:32:19.4168859Z  * [new branch]                revert-152361-gh/fadara01/1/head -> origin/revert-152361-gh/fadara01/1/head
2025-12-04T10:32:19.4168960Z  * [new branch]                revert-156870-gh/skarjala/3/head -> origin/revert-156870-gh/skarjala/3/head
2025-12-04T10:32:19.4169132Z  * [new branch]                revert-157914-cherry-pick-157503-by-pytorch_bot_bot_ -> origin/revert-157914-cherry-pick-157503-by-pytorch_bot_bot_
2025-12-04T10:32:19.4169227Z  * [new branch]                revert-hoo-invoke-subgraph -> origin/revert-hoo-invoke-subgraph
2025-12-04T10:32:19.4169323Z  * [new branch]                revert_always_build_distributed -> origin/revert_always_build_distributed
2025-12-04T10:32:19.4169390Z  * [new branch]                rms_norm_patch          -> origin/rms_norm_patch
2025-12-04T10:32:19.4169484Z  * [new branch]                ruisi/fix_all_to_all_estimation -> origin/ruisi/fix_all_to_all_estimation
2025-12-04T10:32:19.4169608Z  * [new branch]                ruisi/fix_comm_estimation -> origin/ruisi/fix_comm_estimation
2025-12-04T10:32:19.4169716Z  * [new branch]                ruisi/fix_dynamic_shape_estimation -> origin/ruisi/fix_dynamic_shape_estimation
2025-12-04T10:32:19.4169866Z  * [new branch]                ruisi/fix_llama3_autobucketing -> origin/ruisi/fix_llama3_autobucketing
2025-12-04T10:32:19.4169970Z  * [new branch]                ruisi/fix_manual_bucketing_ep_pass -> origin/ruisi/fix_manual_bucketing_ep_pass
2025-12-04T10:32:19.4170052Z  * [new branch]                ruisi/manual_bucket_pass -> origin/ruisi/manual_bucket_pass
2025-12-04T10:32:19.4170196Z  * [new branch]                ryanguo99/cleanup-dynamo-expected-failures -> origin/ryanguo99/cleanup-dynamo-expected-failures
2025-12-04T10:32:19.4170283Z  * [new branch]                ryanguo99/fix-closure-var -> origin/ryanguo99/fix-closure-var
2025-12-04T10:32:19.4170360Z  * [new branch]                rzou/faketensor_bench   -> origin/rzou/faketensor_bench
2025-12-04T10:32:19.4170422Z  * [new branch]                rzou/njt                -> origin/rzou/njt
2025-12-04T10:32:19.4170484Z  * [new branch]                rzou/pca                -> origin/rzou/pca
2025-12-04T10:32:19.4170548Z  * [new branch]                rzou/realprop           -> origin/rzou/realprop
2025-12-04T10:32:19.4170612Z  * [new branch]                samplevllm              -> origin/samplevllm
2025-12-04T10:32:19.4170779Z  * [new branch]                sanchitintel/weird_thing_with_test_cpu_select_algorithm -> origin/sanchitintel/weird_thing_with_test_cpu_select_algorithm
2025-12-04T10:32:19.4170870Z  * [new branch]                sapling-pr-archive-SS-JIA -> origin/sapling-pr-archive-SS-JIA
2025-12-04T10:32:19.4170981Z  * [new branch]                sapling-pr-archive-tushar00jain -> origin/sapling-pr-archive-tushar00jain
2025-12-04T10:32:19.4171042Z  * [new branch]                save                    -> origin/save
2025-12-04T10:32:19.4171104Z  * [new branch]                scaled_mm               -> origin/scaled_mm
2025-12-04T10:32:19.4187710Z  * [new branch]                scan_attempt            -> origin/scan_attempt
2025-12-04T10:32:19.4187803Z  * [new branch]                sdym/2.5.1              -> origin/sdym/2.5.1
2025-12-04T10:32:19.4187944Z  * [new branch]                sekyondaMeta-dynamoconfig-fix -> origin/sekyondaMeta-dynamoconfig-fix
2025-12-04T10:32:19.4188035Z  * [new branch]                shengf/fx-xform-perf    -> origin/shengf/fx-xform-perf
2025-12-04T10:32:19.4188116Z  * [new branch]                shoumikhin-patch-1      -> origin/shoumikhin-patch-1
2025-12-04T10:32:19.4188190Z  * [new branch]                solve-accuracy-fix      -> origin/solve-accuracy-fix
2025-12-04T10:32:19.4188270Z  * [new branch]                some_rocm_inductor_skips -> origin/some_rocm_inductor_skips
2025-12-04T10:32:19.4188353Z  * [new branch]                soulitzer/stash-tls-ac  -> origin/soulitzer/stash-tls-ac
2025-12-04T10:32:19.4188508Z  * [new branch]                sparse-mm-bf16-support  -> origin/sparse-mm-bf16-support
2025-12-04T10:32:19.4188580Z  * [new branch]                starterTaskUpdate       -> origin/starterTaskUpdate
2025-12-04T10:32:19.4188640Z  * [new branch]                suo                     -> origin/suo
2025-12-04T10:32:19.4188704Z  * [new branch]                sve-poc                 -> origin/sve-poc
2025-12-04T10:32:19.4188766Z  * [new branch]                switch-bn               -> origin/switch-bn
2025-12-04T10:32:19.4188859Z  * [new branch]                sy_annotation_in_autograd_hop -> origin/sy_annotation_in_autograd_hop
2025-12-04T10:32:19.4188927Z  * [new branch]                sy_aot_eager_record     -> origin/sy_aot_eager_record
2025-12-04T10:32:19.4188995Z  * [new branch]                sy_custom_bucketing     -> origin/sy_custom_bucketing
2025-12-04T10:32:19.4189062Z  * [new branch]                sy_debug_mode_test      -> origin/sy_debug_mode_test
2025-12-04T10:32:19.4189130Z  * [new branch]                sy_deserialize          -> origin/sy_deserialize
2025-12-04T10:32:19.4189196Z  * [new branch]                sy_dump_gm_code         -> origin/sy_dump_gm_code
2025-12-04T10:32:19.4189257Z  * [new branch]                sy_exp                  -> origin/sy_exp
2025-12-04T10:32:19.4189327Z  * [new branch]                sy_export_annotation    -> origin/sy_export_annotation
2025-12-04T10:32:19.4189433Z  * [new branch]                sy_invoke_subgraph      -> origin/sy_invoke_subgraph
2025-12-04T10:32:19.4189500Z  * [new branch]                sy_kernel_bw_name       -> origin/sy_kernel_bw_name
2025-12-04T10:32:19.4189562Z  * [new branch]                sy_multi_arch           -> origin/sy_multi_arch
2025-12-04T10:32:19.4189673Z  * [new branch]                sy_nn_module_stack      -> origin/sy_nn_module_stack
2025-12-04T10:32:19.4189745Z  * [new branch]                sy_original_dtensor     -> origin/sy_original_dtensor
2025-12-04T10:32:19.4189817Z  * [new branch]                sy_profiler_cia         -> origin/sy_profiler_cia
2025-12-04T10:32:19.4189880Z  * [new branch]                symm_mem_sync           -> origin/symm_mem_sync
2025-12-04T10:32:19.4189964Z  * [new branch]                sympy-bottleneck-repro  -> origin/sympy-bottleneck-repro
2025-12-04T10:32:19.4190041Z  * [new branch]                tensordict_integration  -> origin/tensordict_integration
2025-12-04T10:32:19.4190126Z  * [new branch]                test-move-conda-builds  -> origin/test-move-conda-builds
2025-12-04T10:32:19.4190187Z  * [new branch]                test-old                -> origin/test-old
2025-12-04T10:32:19.4190250Z  * [new branch]                test/bmm_heur           -> origin/test/bmm_heur
2025-12-04T10:32:19.4190347Z  * [new branch]                tianren/customOp_autotune_fix -> origin/tianren/customOp_autotune_fix
2025-12-04T10:32:19.4190462Z  * [new branch]                tianren/customOp_enable_max_autotune -> origin/tianren/customOp_enable_max_autotune
2025-12-04T10:32:19.4190548Z  * [new branch]                tianren/customOp_fusion -> origin/tianren/customOp_fusion
2025-12-04T10:32:19.4190674Z  * [new branch]                tianren/customop_collectiveop_benchmark -> origin/tianren/customop_collectiveop_benchmark
2025-12-04T10:32:19.4190808Z  * [new branch]                tianren/customop_collectiveop_benchmark_fix -> origin/tianren/customop_collectiveop_benchmark_fix
2025-12-04T10:32:19.4190913Z  * [new branch]                tianren/customop_dynamic_config -> origin/tianren/customop_dynamic_config
2025-12-04T10:32:19.4191007Z  * [new branch]                tianren/dynamic_range_input -> origin/tianren/dynamic_range_input
2025-12-04T10:32:19.4191105Z  * [new branch]                tianren/dynamic_range_input_fix -> origin/tianren/dynamic_range_input_fix
2025-12-04T10:32:19.4191212Z  * [new branch]                tianren/dynamic_range_input_merge -> origin/tianren/dynamic_range_input_merge
2025-12-04T10:32:19.4191313Z  * [new branch]                tianren/flex_paged_attn_fix_temp -> origin/tianren/flex_paged_attn_fix_temp
2025-12-04T10:32:19.4191444Z  * [new branch]                tianren/fx_codegen_dump -> origin/tianren/fx_codegen_dump
2025-12-04T10:32:19.4191526Z  * [new branch]                tianren/symmetric_memory -> origin/tianren/symmetric_memory
2025-12-04T10:32:19.4191590Z  * [new branch]                tianren/test            -> origin/tianren/test
2025-12-04T10:32:19.4191666Z  * [new branch]                tidy_performance_cyy    -> origin/tidy_performance_cyy
2025-12-04T10:32:19.4191725Z  * [new branch]                tmp                     -> origin/tmp
2025-12-04T10:32:19.4191790Z  * [new branch]                torchtitan_ep           -> origin/torchtitan_ep
2025-12-04T10:32:19.4191867Z  * [new branch]                torchtitan_integration  -> origin/torchtitan_integration
2025-12-04T10:32:19.4191952Z  * [new branch]                trace_fsdp_torchtune_lora -> origin/trace_fsdp_torchtune_lora
2025-12-04T10:32:19.4192036Z  * [new branch]                traceable_fsdp_unit_tests -> origin/traceable_fsdp_unit_tests
2025-12-04T10:32:19.4192104Z  * [new branch]                tree_loop_vec_base      -> origin/tree_loop_vec_base
2025-12-04T10:32:19.4192169Z  * [new branch]                triton_kernel           -> origin/triton_kernel
2025-12-04T10:32:19.4192229Z  * [new branch]                tt_pkg_1908             -> origin/tt_pkg_1908
2025-12-04T10:32:19.4192332Z  * [new branch]                type_dec                -> origin/type_dec
2025-12-04T10:32:19.4192426Z  * [new branch]                udate-sphinx-dependancies -> origin/udate-sphinx-dependancies
2025-12-04T10:32:19.4192565Z  * [new branch]                update-audio-commit-hash/17630256502-1803-1 -> origin/update-audio-commit-hash/17630256502-1803-1
2025-12-04T10:32:19.4192701Z  * [new branch]                update-audio-commit-hash/19087141161-1916-1 -> origin/update-audio-commit-hash/19087141161-1916-1
2025-12-04T10:32:19.4192831Z  * [new branch]                update-audio-commit-hash/19250643381-1929-1 -> origin/update-audio-commit-hash/19250643381-1929-1
2025-12-04T10:32:19.4192961Z  * [new branch]                update-audio-commit-hash/19397724337-1935-1 -> origin/update-audio-commit-hash/19397724337-1935-1
2025-12-04T10:32:19.4193092Z  * [new branch]                update-audio-commit-hash/19555670148-1941-1 -> origin/update-audio-commit-hash/19555670148-1941-1
2025-12-04T10:32:19.4193222Z  * [new branch]                update-audio-commit-hash/19750627930-1946-1 -> origin/update-audio-commit-hash/19750627930-1946-1
2025-12-04T10:32:19.4193356Z  * [new branch]                update-triton-commit-hash/13663274526-1487-2 -> origin/update-triton-commit-hash/13663274526-1487-2
2025-12-04T10:32:19.4193490Z  * [new branch]                update-vision-commit-hash/19087141161-1916-1 -> origin/update-vision-commit-hash/19087141161-1916-1
2025-12-04T10:32:19.4193622Z  * [new branch]                update-vision-commit-hash/19184897099-1925-1 -> origin/update-vision-commit-hash/19184897099-1925-1
2025-12-04T10:32:19.4193754Z  * [new branch]                update-vision-commit-hash/19250643381-1929-1 -> origin/update-vision-commit-hash/19250643381-1929-1
2025-12-04T10:32:19.4193887Z  * [new branch]                update-vision-commit-hash/19381328640-1934-1 -> origin/update-vision-commit-hash/19381328640-1934-1
2025-12-04T10:32:19.4194019Z  * [new branch]                update-vision-commit-hash/19485237164-1938-1 -> origin/update-vision-commit-hash/19485237164-1938-1
2025-12-04T10:32:19.4194150Z  * [new branch]                update-vllm-commit-hash/18451675449-1879-1 -> origin/update-vllm-commit-hash/18451675449-1879-1
2025-12-04T10:32:19.4194233Z  * [new branch]                update-vllm-dockerfile  -> origin/update-vllm-dockerfile
2025-12-04T10:32:19.4194357Z  * [new branch]                update-xla-commit-hash/19224287370-211-1 -> origin/update-xla-commit-hash/19224287370-211-1
2025-12-04T10:32:19.4194481Z  * [new branch]                update-xla-commit-hash/19422028566-212-1 -> origin/update-xla-commit-hash/19422028566-212-1
2025-12-04T10:32:19.4194631Z  * [new branch]                update-xla-commit-hash/19626841311-213-1 -> origin/update-xla-commit-hash/19626841311-213-1
2025-12-04T10:32:19.4194759Z  * [new branch]                update_docs_torch_multinomial_issue#125388 -> origin/update_docs_torch_multinomial_issue#125388
2025-12-04T10:32:19.4194843Z  * [new branch]                update_operator_readme  -> origin/update_operator_readme
2025-12-04T10:32:19.4194933Z  * [new branch]                update_slow_tests_1722488736 -> origin/update_slow_tests_1722488736
2025-12-04T10:32:19.4195019Z  * [new branch]                update_slow_tests_1722879173 -> origin/update_slow_tests_1722879173
2025-12-04T10:32:19.4195106Z  * [new branch]                update_slow_tests_1762155677 -> origin/update_slow_tests_1762155677
2025-12-04T10:32:19.4195190Z  * [new branch]                update_slow_tests_1763365283 -> origin/update_slow_tests_1763365283
2025-12-04T10:32:19.4195288Z  * [new branch]                update_submodule_FBGEMM -> origin/update_submodule_FBGEMM
2025-12-04T10:32:19.4195366Z  * [new branch]                update_submodule_kineto -> origin/update_submodule_kineto
2025-12-04T10:32:19.4195455Z  * [new branch]                update_submodule_tensorpipe -> origin/update_submodule_tensorpipe
2025-12-04T10:32:19.4195557Z  * [new branch]                upload-tests-for-autorevert -> origin/upload-tests-for-autorevert
2025-12-04T10:32:19.4195645Z  * [new branch]                v0.1.2                  -> origin/v0.1.2
2025-12-04T10:32:19.4195705Z  * [new branch]                v1.0.1                  -> origin/v1.0.1
2025-12-04T10:32:19.4195767Z  * [new branch]                v1.0.3                  -> origin/v1.0.3
2025-12-04T10:32:19.4195822Z  * [new branch]                v1.1.0                  -> origin/v1.1.0
2025-12-04T10:32:19.4195878Z  * [new branch]                v1.2.0                  -> origin/v1.2.0
2025-12-04T10:32:19.4195936Z  * [new branch]                v1.3.0                  -> origin/v1.3.0
2025-12-04T10:32:19.4195992Z  * [new branch]                v1.3.1                  -> origin/v1.3.1
2025-12-04T10:32:19.4196056Z  * [new branch]                validate_fn             -> origin/validate_fn
2025-12-04T10:32:19.4196124Z  * [new branch]                validations_2.6         -> origin/validations_2.6
2025-12-04T10:32:19.4196192Z  * [new branch]                validations_2.8         -> origin/validations_2.8
2025-12-04T10:32:19.4196256Z  * [new branch]                varlen-api              -> origin/varlen-api
2025-12-04T10:32:19.4196331Z  * [new branch]                varlen-api-backup       -> origin/varlen-api-backup
2025-12-04T10:32:19.4196405Z  * [new branch]                varlen_batch_invariance -> origin/varlen_batch_invariance
2025-12-04T10:32:19.4196468Z  * [new branch]                viable/strict           -> origin/viable/strict
2025-12-04T10:32:19.4196586Z  * [new branch]                vishal9-team/dtensor_parallelism_toy -> origin/vishal9-team/dtensor_parallelism_toy
2025-12-04T10:32:19.4196650Z  * [new branch]                vllmbuildci             -> origin/vllmbuildci
2025-12-04T10:32:19.4196710Z  * [new branch]                vllmpin                 -> origin/vllmpin
2025-12-04T10:32:19.4196802Z  * [new branch]                vscode-recommend-pyrefly -> origin/vscode-recommend-pyrefly
2025-12-04T10:32:19.4196869Z  * [new branch]                wdvr-patch-1            -> origin/wdvr-patch-1
2025-12-04T10:32:19.4196934Z  * [new branch]                wdvr/iss_145259         -> origin/wdvr/iss_145259
2025-12-04T10:32:19.4196996Z  * [new branch]                whc/pei                 -> origin/whc/pei
2025-12-04T10:32:19.4197061Z  * [new branch]                whc/pp_fix              -> origin/whc/pp_fix
2025-12-04T10:32:19.4197124Z  * [new branch]                whc/sharding            -> origin/whc/sharding
2025-12-04T10:32:19.4197188Z  * [new branch]                whc/sharding2           -> origin/whc/sharding2
2025-12-04T10:32:19.4197292Z  * [new branch]                whc/uneven              -> origin/whc/uneven
2025-12-04T10:32:19.4197361Z  * [new branch]                whc/uneven-merge        -> origin/whc/uneven-merge
2025-12-04T10:32:19.4197423Z  * [new branch]                win_warnings            -> origin/win_warnings
2025-12-04T10:32:19.4197497Z  * [new branch]                windows_libtorch_free   -> origin/windows_libtorch_free
2025-12-04T10:32:19.4197561Z  * [new branch]                xmfan-war               -> origin/xmfan-war
2025-12-04T10:32:19.4197625Z  * [new branch]                xmfan/ca_0516           -> origin/xmfan/ca_0516
2025-12-04T10:32:19.4197694Z  * [new branch]                xmfan/ca_1051b93192     -> origin/xmfan/ca_1051b93192
2025-12-04T10:32:19.4197846Z  * [new branch]                xmfan/ca_1a722f62c248391fc4a542e8851a5559aa356ae8 -> origin/xmfan/ca_1a722f62c248391fc4a542e8851a5559aa356ae8
2025-12-04T10:32:19.4197917Z  * [new branch]                xmfan/ca_5a2be192d1     -> origin/xmfan/ca_5a2be192d1
2025-12-04T10:32:19.4197988Z  * [new branch]                xmfan/ca_9d59b516e9     -> origin/xmfan/ca_9d59b516e9
2025-12-04T10:32:19.4198051Z  * [new branch]                xmfan/ca_apr8           -> origin/xmfan/ca_apr8
2025-12-04T10:32:19.4198114Z  * [new branch]                xmfan/ca_base           -> origin/xmfan/ca_base
2025-12-04T10:32:19.4198181Z  * [new branch]                xmfan/ca_dynamic        -> origin/xmfan/ca_dynamic
2025-12-04T10:32:19.4198279Z  * [new branch]                xmfan/ca_fix_dyn        -> origin/xmfan/ca_fix_dyn
2025-12-04T10:32:19.4198351Z  * [new branch]                xmfan/ca_fix_lowering   -> origin/xmfan/ca_fix_lowering
2025-12-04T10:32:19.4198425Z  * [new branch]                xmfan/ca_fix_polyfills  -> origin/xmfan/ca_fix_polyfills
2025-12-04T10:32:19.4198489Z  * [new branch]                xmfan/ca_jan3           -> origin/xmfan/ca_jan3
2025-12-04T10:32:19.4198553Z  * [new branch]                xmfan/ca_jun18          -> origin/xmfan/ca_jun18
2025-12-04T10:32:19.4198618Z  * [new branch]                xmfan/ca_jun24          -> origin/xmfan/ca_jun24
2025-12-04T10:32:19.4198684Z  * [new branch]                xmfan/ca_nested         -> origin/xmfan/ca_nested
2025-12-04T10:32:19.4198750Z  * [new branch]                xmfan/ca_overhead       -> origin/xmfan/ca_overhead
2025-12-04T10:32:19.4198841Z  * [new branch]                xmfan/ca_overhead_0eba7e5451 -> origin/xmfan/ca_overhead_0eba7e5451
2025-12-04T10:32:19.4198910Z  * [new branch]                xmfan/cacu_jun18        -> origin/xmfan/cacu_jun18
2025-12-04T10:32:19.4198976Z  * [new branch]                xmfan/cacu_jun19        -> origin/xmfan/cacu_jun19
2025-12-04T10:32:19.4199040Z  * [new branch]                xmfan/cacu_jun4         -> origin/xmfan/cacu_jun4
2025-12-04T10:32:19.4199123Z  * [new branch]                xmfan/disable_duck_shape -> origin/xmfan/disable_duck_shape
2025-12-04T10:32:19.4199221Z  * [new branch]                xmfan/fca_cpp_node_passthrough -> origin/xmfan/fca_cpp_node_passthrough
2025-12-04T10:32:19.4199371Z  * [new branch]                xmfan/post_3945954741e2d37023c5d6954f9483008e0892f9 -> origin/xmfan/post_3945954741e2d37023c5d6954f9483008e0892f9
2025-12-04T10:32:19.4199517Z  * [new branch]                xmfan/pre_3945954741e2d37023c5d6954f9483008e0892f9 -> origin/xmfan/pre_3945954741e2d37023c5d6954f9483008e0892f9
2025-12-04T10:32:19.4199618Z  * [new branch]                xmfan/single_step       -> origin/xmfan/single_step
2025-12-04T10:32:19.4199685Z  * [new branch]                xmfan/sth_0829          -> origin/xmfan/sth_0829
2025-12-04T10:32:19.4199746Z  * [new branch]                xmfan/test              -> origin/xmfan/test
2025-12-04T10:32:19.4199832Z  * [new branch]                yguo/debug-0226-constexpr -> origin/yguo/debug-0226-constexpr
2025-12-04T10:32:19.4199910Z  * [new branch]                yguo/new_latest_changes -> origin/yguo/new_latest_changes
2025-12-04T10:32:19.4200005Z  * [new branch]                yguo/patch_constexpr_changes -> origin/yguo/patch_constexpr_changes
2025-12-04T10:32:19.4200126Z  * [new branch]                yiming/bootcamp         -> origin/yiming/bootcamp
2025-12-04T10:32:19.4200229Z  * [new branch]                yiming/run_with_start_end_rng_hop -> origin/yiming/run_with_start_end_rng_hop
2025-12-04T10:32:19.4200292Z  * [new branch]                yolo-llama3             -> origin/yolo-llama3
2025-12-04T10:32:19.4200364Z  * [new branch]                zainr/canary-test       -> origin/zainr/canary-test
2025-12-04T10:32:19.4200453Z  * [new branch]                zainr/cleanup-gh-runners -> origin/zainr/cleanup-gh-runners
2025-12-04T10:32:19.4200533Z  * [new branch]                zainr/pull-migration-c  -> origin/zainr/pull-migration-c
2025-12-04T10:32:19.4200595Z  * [new branch]                zainr/test2             -> origin/zainr/test2
2025-12-04T10:32:19.4200669Z  * [new branch]                zasdfgbnm-patch-3       -> origin/zasdfgbnm-patch-3
2025-12-04T10:32:19.4200727Z  * [new branch]                zb2p                    -> origin/zb2p
2025-12-04T10:32:19.4200812Z  * [new branch]                zeros-and-scatter-part2 -> origin/zeros-and-scatter-part2
2025-12-04T10:32:19.4200901Z  * [new branch]                zhxchen17/ci/vllm_lora_oom -> origin/zhxchen17/ci/vllm_lora_oom
2025-12-04T10:32:19.4201004Z  * [new branch]                zhxchen17/ci/vllm_multimodal_oom -> origin/zhxchen17/ci/vllm_multimodal_oom
2025-12-04T10:32:19.4201119Z  * [new branch]                zhxchen17/ci/vllm_pin   -> origin/zhxchen17/ci/vllm_pin
2025-12-04T10:32:19.4201243Z  * [new branch]                zhxchen17/dynamo/unsafe_drop_all_guards -> origin/zhxchen17/dynamo/unsafe_drop_all_guards
2025-12-04T10:32:19.4201341Z  * [new branch]                zhxchen17/export/call_override -> origin/zhxchen17/export/call_override
2025-12-04T10:32:19.4201426Z  * [new branch]                zhxchen17/export/codemod1 -> origin/zhxchen17/export/codemod1
2025-12-04T10:32:19.4201515Z  * [new branch]                zhxchen17/export/ctx_return -> origin/zhxchen17/export/ctx_return
2025-12-04T10:32:19.4201643Z  * [new branch]                zhxchen17/export/disable_side_effect_warn -> origin/zhxchen17/export/disable_side_effect_warn
2025-12-04T10:32:19.4201744Z  * [new branch]                zhxchen17/export/pytree_check -> origin/zhxchen17/export/pytree_check
2025-12-04T10:32:19.4201830Z  * [new branch]                zhxchen17/precompile/aoti -> origin/zhxchen17/precompile/aoti
2025-12-04T10:32:19.4201926Z  * [new branch]                zhxchen17/precompile/globals -> origin/zhxchen17/precompile/globals
2025-12-04T10:32:19.4202042Z  * [new branch]                zhxchen17/precompile/inductor_guards -> origin/zhxchen17/precompile/inductor_guards
2025-12-04T10:32:19.4202114Z  * [new branch]                zhxchen17/scratch/0     -> origin/zhxchen17/scratch/0
2025-12-04T10:32:19.4202219Z  * [new branch]                zhxchen17/torch_export_api_update -> origin/zhxchen17/torch_export_api_update
2025-12-04T10:32:19.4202295Z  * [new branch]                zhxhcen17/moodycamel    -> origin/zhxhcen17/moodycamel
2025-12-04T10:32:19.4202370Z  * [new branch]                zxiiro/build-times      -> origin/zxiiro/build-times
2025-12-04T10:32:19.4202442Z  * [new branch]                zxiiro/c7i.2xlarge      -> origin/zxiiro/c7i.2xlarge
2025-12-04T10:32:19.4202520Z  * [new branch]                zxiiro/c7i.2xlarge.h100 -> origin/zxiiro/c7i.2xlarge.h100
2025-12-04T10:32:19.4202582Z  * [new branch]                zxiiro/main             -> origin/zxiiro/main
2025-12-04T10:32:19.4202645Z  * [new branch]                zxiiro/risc64           -> origin/zxiiro/risc64
2025-12-04T10:32:19.4202737Z  * [new branch]                zxiiro/test-multicloud-arc -> origin/zxiiro/test-multicloud-arc
2025-12-04T10:32:19.4202807Z  t [tag update]                ciflow/inductor/169437  -> ciflow/inductor/169437
2025-12-04T10:32:19.4202872Z  t [tag update]                ciflow/trunk/169437     -> ciflow/trunk/169437
2025-12-04T10:32:19.4203008Z  * [new tag]                   trunk/c0cb6e78404416d418350632bfc554710a5f7281 -> trunk/c0cb6e78404416d418350632bfc554710a5f7281
2025-12-04T10:32:19.6164008Z [command]/usr/bin/git rev-parse --verify --quiet ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32^{object}
2025-12-04T10:32:19.6350222Z ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T10:32:19.6354404Z ##[endgroup]
2025-12-04T10:32:19.6354728Z ##[group]Determining the checkout info
2025-12-04T10:32:19.6356126Z ##[endgroup]
2025-12-04T10:32:19.6361308Z [command]/usr/bin/git sparse-checkout disable
2025-12-04T10:32:19.6449711Z [command]/usr/bin/git config --local --unset-all extensions.worktreeConfig
2025-12-04T10:32:19.6466842Z ##[group]Checking out the ref
2025-12-04T10:32:19.6468683Z [command]/usr/bin/git checkout --progress --force ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T10:32:19.7286564Z Previous HEAD position was c0cb6e784044 [DTensor] ExplicitRedistributionContext warning mode (#169452)
2025-12-04T10:32:19.7290133Z HEAD is now at ffd9b0fb4355 Resolve collective autotuning test failure on arm (#168919)
2025-12-04T10:32:19.7373501Z ##[endgroup]
2025-12-04T10:32:19.7373718Z ##[group]Setting up auth for fetching submodules
2025-12-04T10:32:19.7379138Z [command]/usr/bin/git config --global http.https://github.com/.extraheader AUTHORIZATION: basic ***
2025-12-04T10:32:19.7405890Z [command]/usr/bin/git config --global --unset-all url.https://github.com/.insteadOf
2025-12-04T10:32:19.7425617Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf git@github.com:
2025-12-04T10:32:19.7449221Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf org-21003710@github.com:
2025-12-04T10:32:19.7464383Z ##[endgroup]
2025-12-04T10:32:19.7464606Z ##[group]Fetching submodules
2025-12-04T10:32:19.7466002Z [command]/usr/bin/git submodule sync --recursive
2025-12-04T10:32:19.7688134Z Synchronizing submodule url for 'android/libs/fbjni'
2025-12-04T10:32:19.7701321Z Synchronizing submodule url for 'third_party/FP16'
2025-12-04T10:32:19.7718513Z Synchronizing submodule url for 'third_party/FXdiv'
2025-12-04T10:32:19.7733257Z Synchronizing submodule url for 'third_party/NNPACK'
2025-12-04T10:32:19.7750335Z Synchronizing submodule url for 'third_party/NVTX'
2025-12-04T10:32:19.7761648Z Synchronizing submodule url for 'third_party/VulkanMemoryAllocator'
2025-12-04T10:32:19.7774687Z Synchronizing submodule url for 'third_party/XNNPACK'
2025-12-04T10:32:19.7794470Z Synchronizing submodule url for 'third_party/aiter'
2025-12-04T10:32:19.7808607Z Synchronizing submodule url for 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T10:32:19.7826021Z Synchronizing submodule url for 'third_party/benchmark'
2025-12-04T10:32:19.7836476Z Synchronizing submodule url for 'third_party/composable_kernel'
2025-12-04T10:32:19.7852734Z Synchronizing submodule url for 'third_party/cpp-httplib'
2025-12-04T10:32:19.7867806Z Synchronizing submodule url for 'third_party/cpuinfo'
2025-12-04T10:32:19.7878357Z Synchronizing submodule url for 'third_party/cudnn_frontend'
2025-12-04T10:32:19.7887699Z Synchronizing submodule url for 'third_party/cutlass'
2025-12-04T10:32:19.7899962Z Synchronizing submodule url for 'third_party/fbgemm'
2025-12-04T10:32:19.7911819Z Synchronizing submodule url for 'third_party/fbgemm/external/asmjit'
2025-12-04T10:32:19.7923035Z Synchronizing submodule url for 'third_party/fbgemm/external/composable_kernel'
2025-12-04T10:32:19.7947817Z Synchronizing submodule url for 'third_party/fbgemm/external/cpuinfo'
2025-12-04T10:32:19.7966961Z Synchronizing submodule url for 'third_party/fbgemm/external/cutlass'
2025-12-04T10:32:19.7989408Z Synchronizing submodule url for 'third_party/fbgemm/external/googletest'
2025-12-04T10:32:19.8000711Z Synchronizing submodule url for 'third_party/fbgemm/external/hipify_torch'
2025-12-04T10:32:19.8011896Z Synchronizing submodule url for 'third_party/fbgemm/external/json'
2025-12-04T10:32:19.8026915Z Synchronizing submodule url for 'third_party/flash-attention'
2025-12-04T10:32:19.8039360Z Synchronizing submodule url for 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T10:32:19.8051927Z Synchronizing submodule url for 'third_party/flash-attention/csrc/cutlass'
2025-12-04T10:32:19.8066604Z Synchronizing submodule url for 'third_party/flatbuffers'
2025-12-04T10:32:19.8078233Z Synchronizing submodule url for 'third_party/fmt'
2025-12-04T10:32:19.8090239Z Synchronizing submodule url for 'third_party/gemmlowp/gemmlowp'
2025-12-04T10:32:19.8100979Z Synchronizing submodule url for 'third_party/gloo'
2025-12-04T10:32:19.8111291Z Synchronizing submodule url for 'third_party/googletest'
2025-12-04T10:32:19.8122409Z Synchronizing submodule url for 'third_party/ideep'
2025-12-04T10:32:19.8133659Z Synchronizing submodule url for 'third_party/ideep/mkl-dnn'
2025-12-04T10:32:19.8148095Z Synchronizing submodule url for 'third_party/ittapi'
2025-12-04T10:32:19.8158384Z Synchronizing submodule url for 'third_party/kineto'
2025-12-04T10:32:19.8171135Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T10:32:19.8180946Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T10:32:19.8191620Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T10:32:19.8205322Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T10:32:19.8215216Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T10:32:19.8224417Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T10:32:19.8236013Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T10:32:19.8246644Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T10:32:19.8258467Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T10:32:19.8269922Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T10:32:19.8280855Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T10:32:19.8291522Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T10:32:19.8302446Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T10:32:19.8315033Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T10:32:19.8328350Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T10:32:19.8344735Z Synchronizing submodule url for 'third_party/kleidiai'
2025-12-04T10:32:19.8358000Z Synchronizing submodule url for 'third_party/mimalloc'
2025-12-04T10:32:19.8368711Z Synchronizing submodule url for 'third_party/nlohmann'
2025-12-04T10:32:19.8379392Z Synchronizing submodule url for 'third_party/onnx'
2025-12-04T10:32:19.8396221Z Synchronizing submodule url for 'third_party/onnx/third_party/pybind11'
2025-12-04T10:32:19.8408437Z Synchronizing submodule url for 'third_party/opentelemetry-cpp'
2025-12-04T10:32:19.8426048Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T10:32:19.8435504Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T10:32:19.8446008Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T10:32:19.8459687Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T10:32:19.8470150Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T10:32:19.8479800Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T10:32:19.8491503Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T10:32:19.8501365Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T10:32:19.8515277Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T10:32:19.8528760Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T10:32:19.8558171Z Synchronizing submodule url for 'third_party/pocketfft'
2025-12-04T10:32:19.8567509Z Synchronizing submodule url for 'third_party/protobuf'
2025-12-04T10:32:19.8579530Z Synchronizing submodule url for 'third_party/protobuf/third_party/benchmark'
2025-12-04T10:32:19.8591200Z Synchronizing submodule url for 'third_party/protobuf/third_party/googletest'
2025-12-04T10:32:19.8607385Z Synchronizing submodule url for 'third_party/psimd'
2025-12-04T10:32:19.8618528Z Synchronizing submodule url for 'third_party/pthreadpool'
2025-12-04T10:32:19.8634543Z Synchronizing submodule url for 'third_party/pybind11'
2025-12-04T10:32:19.8645346Z Synchronizing submodule url for 'third_party/python-peachpy'
2025-12-04T10:32:19.8655932Z Synchronizing submodule url for 'third_party/sleef'
2025-12-04T10:32:19.8665434Z Synchronizing submodule url for 'third_party/tensorpipe'
2025-12-04T10:32:19.8675049Z Synchronizing submodule url for 'third_party/tensorpipe/third_party/googletest'
2025-12-04T10:32:19.8685860Z Synchronizing submodule url for 'third_party/tensorpipe/third_party/libnop'
2025-12-04T10:32:19.8696538Z Synchronizing submodule url for 'third_party/tensorpipe/third_party/libuv'
2025-12-04T10:32:19.8708586Z Synchronizing submodule url for 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T10:32:19.8720730Z Synchronizing submodule url for 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T10:32:19.8746275Z [command]/usr/bin/git -c protocol.version=2 submodule update --init --force --recursive
2025-12-04T10:32:19.8999009Z Submodule path 'android/libs/fbjni': checked out '7e1e1fe3858c63c251c637ae41a20de425dde96f'
2025-12-04T10:32:19.9064250Z Submodule path 'third_party/FP16': checked out '4dfe081cf6bcd15db339cf2680b9281b8451eeb3'
2025-12-04T10:32:19.9122804Z Submodule path 'third_party/FXdiv': checked out 'b408327ac2a15ec3e43352421954f5b1967701d1'
2025-12-04T10:32:19.9263826Z Submodule path 'third_party/NNPACK': checked out 'c07e3a0400713d546e0dea2d5466dd22ea389c73'
2025-12-04T10:32:19.9336858Z Submodule path 'third_party/NVTX': checked out '3ebbc93ded7285963bff932c678fa367eb393ba6'
2025-12-04T10:32:19.9393406Z Submodule path 'third_party/VulkanMemoryAllocator': checked out '1d8f600fd424278486eade7ed3e877c99f0846b1'
2025-12-04T10:32:20.0251980Z Submodule path 'third_party/XNNPACK': checked out '51a0103656eff6fc9bfd39a4597923c4b542c883'
2025-12-04T10:32:20.0406013Z Submodule path 'third_party/aiter': checked out '01aae101b9e5e94d6c16a9514c9fb8df99c93150'
2025-12-04T10:32:20.0586648Z Submodule path 'third_party/aiter/3rdparty/composable_kernel': checked out 'cffe8fa2a442ac8e80dd236a1a5d24fe3d7e0cbf'
2025-12-04T10:32:20.0704917Z Submodule path 'third_party/benchmark': checked out '299e5928955cc62af9968370293b916f5130916f'
2025-12-04T10:32:20.0877903Z Submodule path 'third_party/composable_kernel': checked out '7fe50dc3da2069d6645d9deb8c017a876472a977'
2025-12-04T10:32:20.0942033Z Submodule path 'third_party/cpp-httplib': checked out '89c932f313c6437c38f2982869beacc89c2f2246'
2025-12-04T10:32:20.1585892Z Submodule path 'third_party/cpuinfo': checked out 'f858c30bcb16f8effd5ff46996f0514539e17abc'
2025-12-04T10:32:20.1692918Z Submodule path 'third_party/cudnn_frontend': checked out '0b1577c8c83401237d601d0d0db5210506705396'
2025-12-04T10:32:20.1818671Z Submodule path 'third_party/cutlass': checked out 'f88806b1e31dfa579842638740216dd41fc6c588'
2025-12-04T10:32:20.2573644Z Submodule path 'third_party/fbgemm': checked out 'c0b988d39a9e47c794d699f29930ed4d7c7e13a4'
2025-12-04T10:32:20.2886525Z Submodule path 'third_party/fbgemm/external/asmjit': checked out 'a3199e8857792cd10b7589ff5d58343d2c9008ea'
2025-12-04T10:32:20.3651157Z Submodule path 'third_party/fbgemm/external/composable_kernel': checked out '7fe50dc3da2069d6645d9deb8c017a876472a977'
2025-12-04T10:32:20.4308795Z Submodule path 'third_party/fbgemm/external/cpuinfo': checked out '6543fec09b2f04ac4a666882998b534afc9c1349'
2025-12-04T10:32:20.8791388Z Submodule path 'third_party/fbgemm/external/cutlass': checked out '98125ce499b0fdf7ffbe0e3052f5b8709f4840f8'
2025-12-04T10:32:20.9015243Z Submodule path 'third_party/fbgemm/external/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723'
2025-12-04T10:32:20.9122961Z Submodule path 'third_party/fbgemm/external/hipify_torch': checked out '63b6a7b541fa7f08f8475ca7d74054db36ff2691'
2025-12-04T10:32:20.9687554Z Submodule path 'third_party/fbgemm/external/json': checked out '9cca280a4d0ccf0c08f47a99aa71d1b0e52f8d03'
2025-12-04T10:32:20.9780524Z Submodule path 'third_party/flash-attention': checked out '979702c87a8713a8e0a5e9fee122b90d2ef13be5'
2025-12-04T10:32:21.0000127Z Submodule path 'third_party/flash-attention/csrc/composable_kernel': checked out '888317e698e9803c62bd38568abc9e05d7709f33'
2025-12-04T10:32:21.0133982Z Submodule path 'third_party/flash-attention/csrc/cutlass': checked out 'c506e16788cb08416a4a57e11a9067beeee29420'
2025-12-04T10:32:21.0225579Z Submodule path 'third_party/flatbuffers': checked out 'a2cd1ea3b6d3fee220106b5fed3f7ce8da9eb757'
2025-12-04T10:32:21.0382847Z Submodule path 'third_party/fmt': checked out '407c905e45ad75fc29bf0f9bb7c5c2fd3475976f'
2025-12-04T10:32:21.0594493Z Submodule path 'third_party/gemmlowp/gemmlowp': checked out '3fb5c176c17c765a3492cd2f0321b0dab712f350'
2025-12-04T10:32:21.0711877Z Submodule path 'third_party/gloo': checked out '54cbae0d3a67fa890b4c3d9ee162b7860315e341'
2025-12-04T10:32:21.0907674Z Submodule path 'third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723'
2025-12-04T10:32:21.0986835Z Submodule path 'third_party/ideep': checked out '719d8e6cd7f7a0e01b155657526d693acf97c2b3'
2025-12-04T10:32:21.4210912Z Submodule path 'third_party/ideep/mkl-dnn': checked out '8d263e693366ef8db40acc569cc7d8edf644556d'
2025-12-04T10:32:21.4313603Z Submodule path 'third_party/ittapi': checked out 'dec1d23ca65ab069d225dfe40dea14f455170959'
2025-12-04T10:32:21.4406850Z Submodule path 'third_party/kineto': checked out '31f85df8fbd89c188f14ef10f1ec65379786b943'
2025-12-04T10:32:21.4498167Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog': checked out 'd2ffe0a4e3acace628db49974246b66fc3e85fb1'
2025-12-04T10:32:21.4578854Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM': checked out 'ffde4e54bc7249a6039a5e6b45b395141e1217f9'
2025-12-04T10:32:21.4658348Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr': checked out '871ed52d350214a034f6ef8a3b8f51c5ce1bd400'
2025-12-04T10:32:21.4740258Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt': checked out 'cd4af11efc9c622896a3e4cb599fa28668ca3d05'
2025-12-04T10:32:21.4792524Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags': checked out 'e171aa2d15ed9eb17054558e0b3a6a413bb01067'
2025-12-04T10:32:21.4867967Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc': checked out '8411df715cf522606e3b1aca386ddfc0b63d34b4'
2025-12-04T10:32:21.4942080Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog': checked out 'b33e3bad4c46c8a6345525fd822af355e5ef9446'
2025-12-04T10:32:21.5005388Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723'
2025-12-04T10:32:21.5097507Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/json': checked out '4f8fba14066156b73f1189a2b8bd568bde5284c5'
2025-12-04T10:32:21.5159763Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs': checked out 'f68a2fa8ea36c783bdd760371411fcb495aa3150'
2025-12-04T10:32:21.5231241Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp': checked out 'b1234816facfdda29845c46696a02998a4af115a'
2025-12-04T10:32:21.5307939Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb': checked out 'd7ba35bbb649209c66e582d5a0244ba988a15159'
2025-12-04T10:32:21.5363797Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest': checked out 'e2239ee6043f73722e7aa812a459f54a28552929'
2025-12-04T10:32:21.5434860Z Submodule path 'third_party/kineto/libkineto/third_party/fmt': checked out '40626af88bd7df9a5fb80be7b25ac85b122d6c21'
2025-12-04T10:32:21.5488428Z Submodule path 'third_party/kineto/libkineto/third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723'
2025-12-04T10:32:21.5571467Z Submodule path 'third_party/kleidiai': checked out 'd7770c89632329a9914ef1a90289917597639cbe'
2025-12-04T10:32:21.5636601Z Submodule path 'third_party/mimalloc': checked out 'fbd8b99c2b828428947d70fdc046bb55609be93e'
2025-12-04T10:32:21.5730837Z Submodule path 'third_party/nlohmann': checked out '55f93686c01528224f448c19128836e7df245f72'
2025-12-04T10:32:21.7492649Z Submodule path 'third_party/onnx': checked out 'e709452ef2bbc1d113faf678c24e6d3467696e83'
2025-12-04T10:32:21.7674352Z Submodule path 'third_party/onnx/third_party/pybind11': checked out 'a2e59f0e7065404b44dfe92a28aca47ba1378dc4'
2025-12-04T10:32:21.7793402Z Submodule path 'third_party/opentelemetry-cpp': checked out 'a799f4aed9c94b765dcdaabaeab7d5e7e2310878'
2025-12-04T10:32:21.7861875Z Submodule path 'third_party/opentelemetry-cpp/third_party/benchmark': checked out 'd572f4777349d43653b21d6c2fc63020ab326db2'
2025-12-04T10:32:21.7929487Z Submodule path 'third_party/opentelemetry-cpp/third_party/googletest': checked out 'b796f7d44681514f58a683a3a71ff17c94edb0c1'
2025-12-04T10:32:21.7977576Z Submodule path 'third_party/opentelemetry-cpp/third_party/ms-gsl': checked out '6f4529395c5b7c2d661812257cd6780c67e54afa'
2025-12-04T10:32:21.8059745Z Submodule path 'third_party/opentelemetry-cpp/third_party/nlohmann-json': checked out 'bc889afb4c5bf1c0d8ee29ef35eaaf4c8bef8a5d'
2025-12-04T10:32:21.8121124Z Submodule path 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto': checked out '4ca4f0335c63cda7ab31ea7ed70d6553aee14dce'
2025-12-04T10:32:21.8183471Z Submodule path 'third_party/opentelemetry-cpp/third_party/opentracing-cpp': checked out '06b57f48ded1fa3bdd3d4346f6ef29e40e08eaf5'
2025-12-04T10:32:21.8244915Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp': checked out 'c9ffcdda9086ffd9e1283ea7a0276d831f3c8a8d'
2025-12-04T10:32:21.8338822Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb': checked out 'eefb26f82b233268fc98577d265352720d477ba4'
2025-12-04T10:32:21.8421394Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest': checked out 'e2239ee6043f73722e7aa812a459f54a28552929'
2025-12-04T10:32:21.8581965Z Submodule path 'third_party/opentelemetry-cpp/tools/vcpkg': checked out '8eb57355a4ffb410a2e94c07b4dca2dffbee8e50'
2025-12-04T10:32:21.8651048Z Submodule path 'third_party/pocketfft': checked out '0fa0ef591e38c2758e3184c6c23e497b9f732ffa'
2025-12-04T10:32:21.9947765Z Submodule path 'third_party/protobuf': checked out 'd1eca4e4b421cd2997495c4b4e65cea6be4e9b8a'
2025-12-04T10:32:22.0033826Z Submodule path 'third_party/protobuf/third_party/benchmark': checked out '5b7683f49e1e9223cf9927b24f6fd3d6bd82e3f8'
2025-12-04T10:32:22.0246965Z Submodule path 'third_party/protobuf/third_party/googletest': checked out '5ec7f0c4a113e2f18ac2c6cc7df51ad6afc24081'
2025-12-04T10:32:22.0310746Z Submodule path 'third_party/psimd': checked out '072586a71b55b7f8c584153d223e95687148a900'
2025-12-04T10:32:22.0401000Z Submodule path 'third_party/pthreadpool': checked out '4fe0e1e183925bf8cfa6aae24237e724a96479b8'
2025-12-04T10:32:22.0585524Z Submodule path 'third_party/pybind11': checked out 'f5fbe867d2d26e4a0a9177a51f6e568868ad3dc8'
2025-12-04T10:32:22.0820246Z Submodule path 'third_party/python-peachpy': checked out 'f45429b087dd7d5bc78bb40dc7cf06425c252d67'
2025-12-04T10:32:22.1077587Z Submodule path 'third_party/sleef': checked out '5a1d179df9cf652951b59010a2d2075372d67f68'
2025-12-04T10:32:22.1188388Z Submodule path 'third_party/tensorpipe': checked out '2b4cd91092d335a697416b2a3cb398283246849d'
2025-12-04T10:32:22.1382063Z Submodule path 'third_party/tensorpipe/third_party/googletest': checked out 'aee0f9d9b5b87796ee8a0ab26b7587ec30e8858e'
2025-12-04T10:32:22.1464080Z Submodule path 'third_party/tensorpipe/third_party/libnop': checked out '910b55815be16109f04f4180e9adee14fb4ce281'
2025-12-04T10:32:22.1763112Z Submodule path 'third_party/tensorpipe/third_party/libuv': checked out '5152db2cbfeb5582e9c27c5ea1dba2cd9e10759b'
2025-12-04T10:32:22.1892261Z Submodule path 'third_party/tensorpipe/third_party/pybind11': checked out 'a23996fce38ff6ccfbcdc09f1e63f2c4be5ea2ef'
2025-12-04T10:32:22.1955598Z Submodule path 'third_party/tensorpipe/third_party/pybind11/tools/clang': checked out '6a00cbc4a9b8e68b71caf7f774b3f9c753ae84d5'
2025-12-04T10:32:22.1983353Z [command]/usr/bin/git submodule foreach --recursive git config --local gc.auto 0
2025-12-04T10:32:22.2214182Z Entering 'android/libs/fbjni'
2025-12-04T10:32:22.2242572Z Entering 'third_party/FP16'
2025-12-04T10:32:22.2269455Z Entering 'third_party/FXdiv'
2025-12-04T10:32:22.2293749Z Entering 'third_party/NNPACK'
2025-12-04T10:32:22.2327323Z Entering 'third_party/NVTX'
2025-12-04T10:32:22.2354968Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T10:32:22.2377861Z Entering 'third_party/XNNPACK'
2025-12-04T10:32:22.2401532Z Entering 'third_party/aiter'
2025-12-04T10:32:22.2419394Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T10:32:22.2451712Z Entering 'third_party/benchmark'
2025-12-04T10:32:22.2483595Z Entering 'third_party/composable_kernel'
2025-12-04T10:32:22.2524385Z Entering 'third_party/cpp-httplib'
2025-12-04T10:32:22.2548500Z Entering 'third_party/cpuinfo'
2025-12-04T10:32:22.2568650Z Entering 'third_party/cudnn_frontend'
2025-12-04T10:32:22.2586976Z Entering 'third_party/cutlass'
2025-12-04T10:32:22.2613134Z Entering 'third_party/fbgemm'
2025-12-04T10:32:22.2636601Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T10:32:22.2662845Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T10:32:22.2687140Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T10:32:22.2714947Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T10:32:22.2737055Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T10:32:22.2755483Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T10:32:22.2772953Z Entering 'third_party/fbgemm/external/json'
2025-12-04T10:32:22.2793367Z Entering 'third_party/flash-attention'
2025-12-04T10:32:22.2819081Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T10:32:22.2853141Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T10:32:22.2883912Z Entering 'third_party/flatbuffers'
2025-12-04T10:32:22.2906749Z Entering 'third_party/fmt'
2025-12-04T10:32:22.2930468Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T10:32:22.2951329Z Entering 'third_party/gloo'
2025-12-04T10:32:22.2973348Z Entering 'third_party/googletest'
2025-12-04T10:32:22.2992276Z Entering 'third_party/ideep'
2025-12-04T10:32:22.3010999Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T10:32:22.3038839Z Entering 'third_party/ittapi'
2025-12-04T10:32:22.3057729Z Entering 'third_party/kineto'
2025-12-04T10:32:22.3081330Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T10:32:22.3101975Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T10:32:22.3123313Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T10:32:22.3143380Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T10:32:22.3166419Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T10:32:22.3193458Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T10:32:22.3223812Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T10:32:22.3244284Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T10:32:22.3267845Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T10:32:22.3287282Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T10:32:22.3305544Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T10:32:22.3324269Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T10:32:22.3345996Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T10:32:22.3376211Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T10:32:22.3398733Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T10:32:22.3421195Z Entering 'third_party/kleidiai'
2025-12-04T10:32:22.3441537Z Entering 'third_party/mimalloc'
2025-12-04T10:32:22.3461762Z Entering 'third_party/nlohmann'
2025-12-04T10:32:22.3481618Z Entering 'third_party/onnx'
2025-12-04T10:32:22.3507177Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T10:32:22.3530476Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T10:32:22.3558878Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T10:32:22.3579073Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T10:32:22.3603333Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T10:32:22.3624825Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T10:32:22.3645789Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T10:32:22.3663812Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T10:32:22.3683011Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T10:32:22.3706791Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T10:32:22.3727597Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T10:32:22.3747581Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T10:32:22.3774749Z Entering 'third_party/pocketfft'
2025-12-04T10:32:22.3794371Z Entering 'third_party/protobuf'
2025-12-04T10:32:22.3813682Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T10:32:22.3836870Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T10:32:22.3860038Z Entering 'third_party/psimd'
2025-12-04T10:32:22.3880890Z Entering 'third_party/pthreadpool'
2025-12-04T10:32:22.3907544Z Entering 'third_party/pybind11'
2025-12-04T10:32:22.3928003Z Entering 'third_party/python-peachpy'
2025-12-04T10:32:22.3947804Z Entering 'third_party/sleef'
2025-12-04T10:32:22.3969476Z Entering 'third_party/tensorpipe'
2025-12-04T10:32:22.3990889Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T10:32:22.4007553Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T10:32:22.4025570Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T10:32:22.4048139Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T10:32:22.4066363Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T10:32:22.4096503Z ##[endgroup]
2025-12-04T10:32:22.4096691Z ##[group]Persisting credentials for submodules
2025-12-04T10:32:22.4101572Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'url\.https\:\/\/github\.com\/\.insteadOf' && git config --local --unset-all 'url.https://github.com/.insteadOf' || :"
2025-12-04T10:32:22.4259562Z Entering 'android/libs/fbjni'
2025-12-04T10:32:22.4282892Z Entering 'third_party/FP16'
2025-12-04T10:32:22.4304237Z Entering 'third_party/FXdiv'
2025-12-04T10:32:22.4325336Z Entering 'third_party/NNPACK'
2025-12-04T10:32:22.4348464Z Entering 'third_party/NVTX'
2025-12-04T10:32:22.4370100Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T10:32:22.4396581Z Entering 'third_party/XNNPACK'
2025-12-04T10:32:22.4426890Z Entering 'third_party/aiter'
2025-12-04T10:32:22.4454882Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T10:32:22.4479907Z Entering 'third_party/benchmark'
2025-12-04T10:32:22.4502218Z Entering 'third_party/composable_kernel'
2025-12-04T10:32:22.4525737Z Entering 'third_party/cpp-httplib'
2025-12-04T10:32:22.4550709Z Entering 'third_party/cpuinfo'
2025-12-04T10:32:22.4574082Z Entering 'third_party/cudnn_frontend'
2025-12-04T10:32:22.4594205Z Entering 'third_party/cutlass'
2025-12-04T10:32:22.4618515Z Entering 'third_party/fbgemm'
2025-12-04T10:32:22.4641342Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T10:32:22.4668524Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T10:32:22.4697699Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T10:32:22.4719775Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T10:32:22.4741757Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T10:32:22.4767562Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T10:32:22.4789484Z Entering 'third_party/fbgemm/external/json'
2025-12-04T10:32:22.4819398Z Entering 'third_party/flash-attention'
2025-12-04T10:32:22.4841397Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T10:32:22.4867478Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T10:32:22.4901416Z Entering 'third_party/flatbuffers'
2025-12-04T10:32:22.4924067Z Entering 'third_party/fmt'
2025-12-04T10:32:22.4945462Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T10:32:22.4965784Z Entering 'third_party/gloo'
2025-12-04T10:32:22.4986790Z Entering 'third_party/googletest'
2025-12-04T10:32:22.5013984Z Entering 'third_party/ideep'
2025-12-04T10:32:22.5039850Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T10:32:22.5066515Z Entering 'third_party/ittapi'
2025-12-04T10:32:22.5088930Z Entering 'third_party/kineto'
2025-12-04T10:32:22.5110671Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T10:32:22.5145307Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T10:32:22.5168187Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T10:32:22.5191417Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T10:32:22.5212833Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T10:32:22.5234780Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T10:32:22.5256620Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T10:32:22.5278312Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T10:32:22.5298532Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T10:32:22.5323543Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T10:32:22.5342562Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T10:32:22.5363464Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T10:32:22.5384286Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T10:32:22.5412037Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T10:32:22.5434395Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T10:32:22.5455500Z Entering 'third_party/kleidiai'
2025-12-04T10:32:22.5479838Z Entering 'third_party/mimalloc'
2025-12-04T10:32:22.5508156Z Entering 'third_party/nlohmann'
2025-12-04T10:32:22.5536596Z Entering 'third_party/onnx'
2025-12-04T10:32:22.5564969Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T10:32:22.5592915Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T10:32:22.5619442Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T10:32:22.5641571Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T10:32:22.5663212Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T10:32:22.5692928Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T10:32:22.5715598Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T10:32:22.5736067Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T10:32:22.5755484Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T10:32:22.5777464Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T10:32:22.5799264Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T10:32:22.5825749Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T10:32:22.5855508Z Entering 'third_party/pocketfft'
2025-12-04T10:32:22.5877879Z Entering 'third_party/protobuf'
2025-12-04T10:32:22.5899150Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T10:32:22.5922744Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T10:32:22.5943770Z Entering 'third_party/psimd'
2025-12-04T10:32:22.5965143Z Entering 'third_party/pthreadpool'
2025-12-04T10:32:22.5984750Z Entering 'third_party/pybind11'
2025-12-04T10:32:22.6013734Z Entering 'third_party/python-peachpy'
2025-12-04T10:32:22.6047301Z Entering 'third_party/sleef'
2025-12-04T10:32:22.6071038Z Entering 'third_party/tensorpipe'
2025-12-04T10:32:22.6094326Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T10:32:22.6132465Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T10:32:22.6155999Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T10:32:22.6181461Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T10:32:22.6207135Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T10:32:22.6249189Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local 'http.https://github.com/.extraheader' 'AUTHORIZATION: basic ***' && git config --local --show-origin --name-only --get-regexp remote.origin.url"
2025-12-04T10:32:22.6418143Z Entering 'android/libs/fbjni'
2025-12-04T10:32:22.6447151Z file:/home/runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config	remote.origin.url
2025-12-04T10:32:22.6455823Z Entering 'third_party/FP16'
2025-12-04T10:32:22.6475406Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config	remote.origin.url
2025-12-04T10:32:22.6485176Z Entering 'third_party/FXdiv'
2025-12-04T10:32:22.6507037Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config	remote.origin.url
2025-12-04T10:32:22.6517342Z Entering 'third_party/NNPACK'
2025-12-04T10:32:22.6538240Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config	remote.origin.url
2025-12-04T10:32:22.6554270Z Entering 'third_party/NVTX'
2025-12-04T10:32:22.6573143Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config	remote.origin.url
2025-12-04T10:32:22.6587110Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T10:32:22.6610725Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config	remote.origin.url
2025-12-04T10:32:22.6620464Z Entering 'third_party/XNNPACK'
2025-12-04T10:32:22.6639074Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config	remote.origin.url
2025-12-04T10:32:22.6653667Z Entering 'third_party/aiter'
2025-12-04T10:32:22.6673347Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config	remote.origin.url
2025-12-04T10:32:22.6682966Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T10:32:22.6715891Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config	remote.origin.url
2025-12-04T10:32:22.6731776Z Entering 'third_party/benchmark'
2025-12-04T10:32:22.6759558Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config	remote.origin.url
2025-12-04T10:32:22.6769843Z Entering 'third_party/composable_kernel'
2025-12-04T10:32:22.6800872Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config	remote.origin.url
2025-12-04T10:32:22.6814264Z Entering 'third_party/cpp-httplib'
2025-12-04T10:32:22.6838933Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config	remote.origin.url
2025-12-04T10:32:22.6850101Z Entering 'third_party/cpuinfo'
2025-12-04T10:32:22.6878101Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config	remote.origin.url
2025-12-04T10:32:22.6888759Z Entering 'third_party/cudnn_frontend'
2025-12-04T10:32:22.6914950Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config	remote.origin.url
2025-12-04T10:32:22.6929229Z Entering 'third_party/cutlass'
2025-12-04T10:32:22.6968263Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config	remote.origin.url
2025-12-04T10:32:22.6991746Z Entering 'third_party/fbgemm'
2025-12-04T10:32:22.7025923Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config	remote.origin.url
2025-12-04T10:32:22.7046836Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T10:32:22.7068416Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config	remote.origin.url
2025-12-04T10:32:22.7084585Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T10:32:22.7105212Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config	remote.origin.url
2025-12-04T10:32:22.7118757Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T10:32:22.7143421Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config	remote.origin.url
2025-12-04T10:32:22.7157467Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T10:32:22.7187523Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config	remote.origin.url
2025-12-04T10:32:22.7212086Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T10:32:22.7241163Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config	remote.origin.url
2025-12-04T10:32:22.7253074Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T10:32:22.7275812Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config	remote.origin.url
2025-12-04T10:32:22.7294365Z Entering 'third_party/fbgemm/external/json'
2025-12-04T10:32:22.7320999Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config	remote.origin.url
2025-12-04T10:32:22.7336220Z Entering 'third_party/flash-attention'
2025-12-04T10:32:22.7359799Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config	remote.origin.url
2025-12-04T10:32:22.7374440Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T10:32:22.7397858Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config	remote.origin.url
2025-12-04T10:32:22.7415454Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T10:32:22.7449303Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config	remote.origin.url
2025-12-04T10:32:22.7463865Z Entering 'third_party/flatbuffers'
2025-12-04T10:32:22.7483903Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config	remote.origin.url
2025-12-04T10:32:22.7496507Z Entering 'third_party/fmt'
2025-12-04T10:32:22.7533575Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config	remote.origin.url
2025-12-04T10:32:22.7545116Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T10:32:22.7567042Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config	remote.origin.url
2025-12-04T10:32:22.7575900Z Entering 'third_party/gloo'
2025-12-04T10:32:22.7600224Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config	remote.origin.url
2025-12-04T10:32:22.7610999Z Entering 'third_party/googletest'
2025-12-04T10:32:22.7631531Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config	remote.origin.url
2025-12-04T10:32:22.7640914Z Entering 'third_party/ideep'
2025-12-04T10:32:22.7664811Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config	remote.origin.url
2025-12-04T10:32:22.7674984Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T10:32:22.7702241Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config	remote.origin.url
2025-12-04T10:32:22.7721719Z Entering 'third_party/ittapi'
2025-12-04T10:32:22.7743800Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config	remote.origin.url
2025-12-04T10:32:22.7755270Z Entering 'third_party/kineto'
2025-12-04T10:32:22.7787506Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config	remote.origin.url
2025-12-04T10:32:22.7796983Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T10:32:22.7817848Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config	remote.origin.url
2025-12-04T10:32:22.7827110Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T10:32:22.7847585Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config	remote.origin.url
2025-12-04T10:32:22.7856920Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T10:32:22.7882819Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config	remote.origin.url
2025-12-04T10:32:22.7895476Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T10:32:22.7920268Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config	remote.origin.url
2025-12-04T10:32:22.7932391Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T10:32:22.7957416Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config	remote.origin.url
2025-12-04T10:32:22.7967802Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T10:32:22.7992398Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config	remote.origin.url
2025-12-04T10:32:22.8004578Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T10:32:22.8032140Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config	remote.origin.url
2025-12-04T10:32:22.8041946Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T10:32:22.8073518Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config	remote.origin.url
2025-12-04T10:32:22.8084470Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T10:32:22.8113376Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config	remote.origin.url
2025-12-04T10:32:22.8126394Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T10:32:22.8153427Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config	remote.origin.url
2025-12-04T10:32:22.8169729Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T10:32:22.8199526Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config	remote.origin.url
2025-12-04T10:32:22.8211217Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T10:32:22.8234080Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config	remote.origin.url
2025-12-04T10:32:22.8245031Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T10:32:22.8267383Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config	remote.origin.url
2025-12-04T10:32:22.8280472Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T10:32:22.8299827Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config	remote.origin.url
2025-12-04T10:32:22.8309186Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T10:32:22.8328169Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config	remote.origin.url
2025-12-04T10:32:22.8339335Z Entering 'third_party/kleidiai'
2025-12-04T10:32:22.8363509Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config	remote.origin.url
2025-12-04T10:32:22.8373448Z Entering 'third_party/mimalloc'
2025-12-04T10:32:22.8397137Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config	remote.origin.url
2025-12-04T10:32:22.8405989Z Entering 'third_party/nlohmann'
2025-12-04T10:32:22.8617845Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config	remote.origin.url
2025-12-04T10:32:22.8628087Z Entering 'third_party/onnx'
2025-12-04T10:32:22.8648873Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config	remote.origin.url
2025-12-04T10:32:22.8674923Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T10:32:22.8704690Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config	remote.origin.url
2025-12-04T10:32:22.8717255Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T10:32:22.8743268Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config	remote.origin.url
2025-12-04T10:32:22.8753158Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T10:32:22.8775577Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config	remote.origin.url
2025-12-04T10:32:22.8784228Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T10:32:22.8802473Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config	remote.origin.url
2025-12-04T10:32:22.8811648Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T10:32:22.8838382Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config	remote.origin.url
2025-12-04T10:32:22.8847948Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T10:32:22.8865417Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config	remote.origin.url
2025-12-04T10:32:22.8873860Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T10:32:22.8898876Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config	remote.origin.url
2025-12-04T10:32:22.8909670Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T10:32:22.8928399Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config	remote.origin.url
2025-12-04T10:32:22.8937464Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T10:32:22.8954920Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config	remote.origin.url
2025-12-04T10:32:22.8963818Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T10:32:22.8997477Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config	remote.origin.url
2025-12-04T10:32:22.9013231Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T10:32:22.9036947Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config	remote.origin.url
2025-12-04T10:32:22.9048060Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T10:32:22.9068786Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config	remote.origin.url
2025-12-04T10:32:22.9092044Z Entering 'third_party/pocketfft'
2025-12-04T10:32:22.9113918Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config	remote.origin.url
2025-12-04T10:32:22.9123407Z Entering 'third_party/protobuf'
2025-12-04T10:32:22.9141713Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config	remote.origin.url
2025-12-04T10:32:22.9152260Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T10:32:22.9174655Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config	remote.origin.url
2025-12-04T10:32:22.9185053Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T10:32:22.9204191Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config	remote.origin.url
2025-12-04T10:32:22.9215127Z Entering 'third_party/psimd'
2025-12-04T10:32:22.9234155Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config	remote.origin.url
2025-12-04T10:32:22.9243558Z Entering 'third_party/pthreadpool'
2025-12-04T10:32:22.9268082Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config	remote.origin.url
2025-12-04T10:32:22.9278522Z Entering 'third_party/pybind11'
2025-12-04T10:32:22.9299740Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config	remote.origin.url
2025-12-04T10:32:22.9308446Z Entering 'third_party/python-peachpy'
2025-12-04T10:32:22.9329686Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config	remote.origin.url
2025-12-04T10:32:22.9339950Z Entering 'third_party/sleef'
2025-12-04T10:32:22.9360119Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config	remote.origin.url
2025-12-04T10:32:22.9369707Z Entering 'third_party/tensorpipe'
2025-12-04T10:32:22.9389883Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config	remote.origin.url
2025-12-04T10:32:22.9399127Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T10:32:22.9418261Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config	remote.origin.url
2025-12-04T10:32:22.9429239Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T10:32:22.9446414Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config	remote.origin.url
2025-12-04T10:32:22.9454876Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T10:32:22.9473624Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config	remote.origin.url
2025-12-04T10:32:22.9487375Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T10:32:22.9516599Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config	remote.origin.url
2025-12-04T10:32:22.9527030Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T10:32:22.9550359Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config	remote.origin.url
2025-12-04T10:32:22.9985720Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'git@github.com:'
2025-12-04T10:32:23.0189735Z Entering 'android/libs/fbjni'
2025-12-04T10:32:23.0213989Z Entering 'third_party/FP16'
2025-12-04T10:32:23.0236272Z Entering 'third_party/FXdiv'
2025-12-04T10:32:23.0262997Z Entering 'third_party/NNPACK'
2025-12-04T10:32:23.0288295Z Entering 'third_party/NVTX'
2025-12-04T10:32:23.0314300Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T10:32:23.0340743Z Entering 'third_party/XNNPACK'
2025-12-04T10:32:23.0371654Z Entering 'third_party/aiter'
2025-12-04T10:32:23.0403291Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T10:32:23.0434271Z Entering 'third_party/benchmark'
2025-12-04T10:32:23.0460064Z Entering 'third_party/composable_kernel'
2025-12-04T10:32:23.0495390Z Entering 'third_party/cpp-httplib'
2025-12-04T10:32:23.0518199Z Entering 'third_party/cpuinfo'
2025-12-04T10:32:23.0543973Z Entering 'third_party/cudnn_frontend'
2025-12-04T10:32:23.0566349Z Entering 'third_party/cutlass'
2025-12-04T10:32:23.0597845Z Entering 'third_party/fbgemm'
2025-12-04T10:32:23.0620800Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T10:32:23.0640761Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T10:32:23.0673481Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T10:32:23.0704215Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T10:32:23.0731920Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T10:32:23.0754488Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T10:32:23.0782593Z Entering 'third_party/fbgemm/external/json'
2025-12-04T10:32:23.0812007Z Entering 'third_party/flash-attention'
2025-12-04T10:32:23.0834459Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T10:32:23.0865852Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T10:32:23.0896506Z Entering 'third_party/flatbuffers'
2025-12-04T10:32:23.0918853Z Entering 'third_party/fmt'
2025-12-04T10:32:23.0944378Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T10:32:23.0965978Z Entering 'third_party/gloo'
2025-12-04T10:32:23.0989418Z Entering 'third_party/googletest'
2025-12-04T10:32:23.1013810Z Entering 'third_party/ideep'
2025-12-04T10:32:23.1041727Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T10:32:23.1067654Z Entering 'third_party/ittapi'
2025-12-04T10:32:23.1088854Z Entering 'third_party/kineto'
2025-12-04T10:32:23.1113114Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T10:32:23.1143532Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T10:32:23.1169886Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T10:32:23.1196704Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T10:32:23.1218327Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T10:32:23.1238335Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T10:32:23.1263558Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T10:32:23.1291737Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T10:32:23.1319306Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T10:32:23.1340168Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T10:32:23.1360979Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T10:32:23.1385111Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T10:32:23.1410437Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T10:32:23.1437997Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T10:32:23.1458593Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T10:32:23.1483152Z Entering 'third_party/kleidiai'
2025-12-04T10:32:23.1513047Z Entering 'third_party/mimalloc'
2025-12-04T10:32:23.1542105Z Entering 'third_party/nlohmann'
2025-12-04T10:32:23.1570334Z Entering 'third_party/onnx'
2025-12-04T10:32:23.1617369Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T10:32:23.1643243Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T10:32:23.1675471Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T10:32:23.1698741Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T10:32:23.1725600Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T10:32:23.1750244Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T10:32:23.1770055Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T10:32:23.1793047Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T10:32:23.1818535Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T10:32:23.1839804Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T10:32:23.1867402Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T10:32:23.1890478Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T10:32:23.1933177Z Entering 'third_party/pocketfft'
2025-12-04T10:32:23.1954559Z Entering 'third_party/protobuf'
2025-12-04T10:32:23.1985268Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T10:32:23.2016334Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T10:32:23.2041547Z Entering 'third_party/psimd'
2025-12-04T10:32:23.2062818Z Entering 'third_party/pthreadpool'
2025-12-04T10:32:23.2090922Z Entering 'third_party/pybind11'
2025-12-04T10:32:23.2119880Z Entering 'third_party/python-peachpy'
2025-12-04T10:32:23.2144942Z Entering 'third_party/sleef'
2025-12-04T10:32:23.2170979Z Entering 'third_party/tensorpipe'
2025-12-04T10:32:23.2193533Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T10:32:23.2212354Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T10:32:23.2235003Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T10:32:23.2261476Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T10:32:23.2283521Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T10:32:23.2320748Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'org-21003710@github.com:'
2025-12-04T10:32:23.2504429Z Entering 'android/libs/fbjni'
2025-12-04T10:32:23.2528263Z Entering 'third_party/FP16'
2025-12-04T10:32:23.2547482Z Entering 'third_party/FXdiv'
2025-12-04T10:32:23.2568928Z Entering 'third_party/NNPACK'
2025-12-04T10:32:23.2587577Z Entering 'third_party/NVTX'
2025-12-04T10:32:23.2606781Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T10:32:23.2628608Z Entering 'third_party/XNNPACK'
2025-12-04T10:32:23.2662874Z Entering 'third_party/aiter'
2025-12-04T10:32:23.2685927Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T10:32:23.2715944Z Entering 'third_party/benchmark'
2025-12-04T10:32:23.2734341Z Entering 'third_party/composable_kernel'
2025-12-04T10:32:23.2757115Z Entering 'third_party/cpp-httplib'
2025-12-04T10:32:23.2776753Z Entering 'third_party/cpuinfo'
2025-12-04T10:32:23.2796534Z Entering 'third_party/cudnn_frontend'
2025-12-04T10:32:23.2817121Z Entering 'third_party/cutlass'
2025-12-04T10:32:23.2848167Z Entering 'third_party/fbgemm'
2025-12-04T10:32:23.2882557Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T10:32:23.2912781Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T10:32:23.2947971Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T10:32:23.2984854Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T10:32:23.3008228Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T10:32:23.3033800Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T10:32:23.3054960Z Entering 'third_party/fbgemm/external/json'
2025-12-04T10:32:23.3085084Z Entering 'third_party/flash-attention'
2025-12-04T10:32:23.3113149Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T10:32:23.3143049Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T10:32:23.3168759Z Entering 'third_party/flatbuffers'
2025-12-04T10:32:23.3201215Z Entering 'third_party/fmt'
2025-12-04T10:32:23.3225069Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T10:32:23.3245824Z Entering 'third_party/gloo'
2025-12-04T10:32:23.3270968Z Entering 'third_party/googletest'
2025-12-04T10:32:23.3293118Z Entering 'third_party/ideep'
2025-12-04T10:32:23.3315958Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T10:32:23.3339652Z Entering 'third_party/ittapi'
2025-12-04T10:32:23.3365386Z Entering 'third_party/kineto'
2025-12-04T10:32:23.3384837Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T10:32:23.3406908Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T10:32:23.3427151Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T10:32:23.3445056Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T10:32:23.3470467Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T10:32:23.3490905Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T10:32:23.3514171Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T10:32:23.3533075Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T10:32:23.3550304Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T10:32:23.3572690Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T10:32:23.3596714Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T10:32:23.3616381Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T10:32:23.3638896Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T10:32:23.3661819Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T10:32:23.3680526Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T10:32:23.3703806Z Entering 'third_party/kleidiai'
2025-12-04T10:32:23.3723568Z Entering 'third_party/mimalloc'
2025-12-04T10:32:23.3745112Z Entering 'third_party/nlohmann'
2025-12-04T10:32:23.3766857Z Entering 'third_party/onnx'
2025-12-04T10:32:23.3791585Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T10:32:23.3814327Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T10:32:23.3835846Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T10:32:23.3856401Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T10:32:23.3877568Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T10:32:23.3896419Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T10:32:23.3922425Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T10:32:23.3947248Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T10:32:23.3965928Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T10:32:23.3984724Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T10:32:23.4023460Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T10:32:23.4047417Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T10:32:23.4075863Z Entering 'third_party/pocketfft'
2025-12-04T10:32:23.4098662Z Entering 'third_party/protobuf'
2025-12-04T10:32:23.4119351Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T10:32:23.4137974Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T10:32:23.4157946Z Entering 'third_party/psimd'
2025-12-04T10:32:23.4177422Z Entering 'third_party/pthreadpool'
2025-12-04T10:32:23.4197002Z Entering 'third_party/pybind11'
2025-12-04T10:32:23.4216515Z Entering 'third_party/python-peachpy'
2025-12-04T10:32:23.4240712Z Entering 'third_party/sleef'
2025-12-04T10:32:23.4259698Z Entering 'third_party/tensorpipe'
2025-12-04T10:32:23.4278697Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T10:32:23.4308355Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T10:32:23.4327620Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T10:32:23.4347924Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T10:32:23.4366108Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T10:32:23.4396362Z ##[endgroup]
2025-12-04T10:32:23.4557690Z [command]/usr/bin/git log -1 --format=%H
2025-12-04T10:32:23.4646281Z ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T10:32:23.4763979Z ##[group]Run actions/checkout@v4
2025-12-04T10:32:23.4764117Z with:
2025-12-04T10:32:23.4764226Z   ref: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T10:32:23.4764371Z   fetch-depth: 0
2025-12-04T10:32:23.4764477Z   submodules: recursive
2025-12-04T10:32:23.4764580Z   show-progress: false
2025-12-04T10:32:23.4764694Z   repository: pytorch/pytorch
2025-12-04T10:32:23.4764847Z   token: ***
2025-12-04T10:32:23.4764936Z   ssh-strict: true
2025-12-04T10:32:23.4765032Z   ssh-user: git
2025-12-04T10:32:23.4765128Z   persist-credentials: true
2025-12-04T10:32:23.4765237Z   clean: true
2025-12-04T10:32:23.4765360Z   sparse-checkout-cone-mode: true
2025-12-04T10:32:23.4765475Z   fetch-tags: false
2025-12-04T10:32:23.4765568Z   lfs: false
2025-12-04T10:32:23.4765657Z   set-safe-directory: true
2025-12-04T10:32:23.4765761Z env:
2025-12-04T10:32:23.4765856Z   GIT_DEFAULT_BRANCH: main
2025-12-04T10:32:23.4765954Z ##[endgroup]
2025-12-04T10:32:23.5216619Z Syncing repository: pytorch/pytorch
2025-12-04T10:32:23.5216918Z ##[group]Getting Git version info
2025-12-04T10:32:23.5217087Z Working directory is '/home/runner/_work/pytorch/pytorch'
2025-12-04T10:32:23.5229513Z [command]/usr/bin/git version
2025-12-04T10:32:23.5249687Z git version 2.52.0
2025-12-04T10:32:23.5260356Z ##[endgroup]
2025-12-04T10:32:23.5264272Z Copying '/home/runner/.gitconfig' to '/home/runner/_work/_temp/08e0f1a2-b75d-4e90-b750-dafa61b62478/.gitconfig'
2025-12-04T10:32:23.5269222Z Temporarily overriding HOME='/home/runner/_work/_temp/08e0f1a2-b75d-4e90-b750-dafa61b62478' before making global git config changes
2025-12-04T10:32:23.5269615Z Adding repository directory to the temporary git global config as a safe directory
2025-12-04T10:32:23.5276707Z [command]/usr/bin/git config --global --add safe.directory /home/runner/_work/pytorch/pytorch
2025-12-04T10:32:23.5296960Z [command]/usr/bin/git config --local --get remote.origin.url
2025-12-04T10:32:23.5314186Z https://github.com/pytorch/pytorch
2025-12-04T10:32:23.5321024Z ##[group]Removing previously created refs, to avoid conflicts
2025-12-04T10:32:23.5322886Z [command]/usr/bin/git rev-parse --symbolic-full-name --verify --quiet HEAD
2025-12-04T10:32:23.5337062Z HEAD
2025-12-04T10:32:23.5363888Z ##[endgroup]
2025-12-04T10:32:23.5365688Z [command]/usr/bin/git submodule status
2025-12-04T10:32:23.5563850Z  7e1e1fe3858c63c251c637ae41a20de425dde96f android/libs/fbjni (v0.1.0-12-g7e1e1fe)
2025-12-04T10:32:23.5610903Z  4dfe081cf6bcd15db339cf2680b9281b8451eeb3 third_party/FP16 (4dfe081)
2025-12-04T10:32:23.5652909Z  b408327ac2a15ec3e43352421954f5b1967701d1 third_party/FXdiv (b408327)
2025-12-04T10:32:23.5711024Z  c07e3a0400713d546e0dea2d5466dd22ea389c73 third_party/NNPACK (c07e3a0)
2025-12-04T10:32:23.5746584Z  3ebbc93ded7285963bff932c678fa367eb393ba6 third_party/NVTX (v3.1.0-313-g3ebbc93)
2025-12-04T10:32:23.5798006Z  1d8f600fd424278486eade7ed3e877c99f0846b1 third_party/VulkanMemoryAllocator (v2.1.0-982-g1d8f600)
2025-12-04T10:32:23.6083939Z  51a0103656eff6fc9bfd39a4597923c4b542c883 third_party/XNNPACK (remotes/origin/ds/ndk-1243-g51a0103656)
2025-12-04T10:32:23.6129251Z  01aae101b9e5e94d6c16a9514c9fb8df99c93150 third_party/aiter (v0.1.1-92-g01aae101)
2025-12-04T10:32:23.6146682Z  299e5928955cc62af9968370293b916f5130916f third_party/benchmark (v1.9.3)
2025-12-04T10:32:23.6210382Z  7fe50dc3da2069d6645d9deb8c017a876472a977 third_party/composable_kernel (rocm-6.4.3-459-g7fe50dc3d)
2025-12-04T10:32:23.6301056Z  89c932f313c6437c38f2982869beacc89c2f2246 third_party/cpp-httplib (v0.26.0)
2025-12-04T10:32:23.6376575Z  f858c30bcb16f8effd5ff46996f0514539e17abc third_party/cpuinfo (f858c30)
2025-12-04T10:32:23.6400104Z  0b1577c8c83401237d601d0d0db5210506705396 third_party/cudnn_frontend (v0.5-61-g0b1577c)
2025-12-04T10:32:23.6462649Z  f88806b1e31dfa579842638740216dd41fc6c588 third_party/cutlass (v4.3.1)
2025-12-04T10:32:23.6495860Z  c0b988d39a9e47c794d699f29930ed4d7c7e13a4 third_party/fbgemm (v1.4.0-rc1-2-gc0b988d39)
2025-12-04T10:32:23.6546891Z  979702c87a8713a8e0a5e9fee122b90d2ef13be5 third_party/flash-attention (v2.7.4)
2025-12-04T10:32:23.6560773Z  a2cd1ea3b6d3fee220106b5fed3f7ce8da9eb757 third_party/flatbuffers (v24.12.23)
2025-12-04T10:32:23.6800702Z  407c905e45ad75fc29bf0f9bb7c5c2fd3475976f third_party/fmt (12.1.0)
2025-12-04T10:32:23.6873595Z  3fb5c176c17c765a3492cd2f0321b0dab712f350 third_party/gemmlowp/gemmlowp (remotes/origin/revert-87-master-135-g3fb5c17)
2025-12-04T10:32:23.6956279Z  54cbae0d3a67fa890b4c3d9ee162b7860315e341 third_party/gloo (remotes/origin/gh/c-p-i-o/1/base-37-g54cbae0)
2025-12-04T10:32:23.7099987Z  52eb8108c5bdec04579160ae17225d66034bd723 third_party/googletest (release-1.8.0-3544-g52eb8108)
2025-12-04T10:32:23.7148580Z  719d8e6cd7f7a0e01b155657526d693acf97c2b3 third_party/ideep (pytorch-rls-v3.7.1)
2025-12-04T10:32:23.7197018Z  dec1d23ca65ab069d225dfe40dea14f455170959 third_party/ittapi (v3.25.5)
2025-12-04T10:32:23.7323014Z  31f85df8fbd89c188f14ef10f1ec65379786b943 third_party/kineto (heads/main)
2025-12-04T10:32:23.7348940Z  d7770c89632329a9914ef1a90289917597639cbe third_party/kleidiai (v1.15.0)
2025-12-04T10:32:23.7370821Z  fbd8b99c2b828428947d70fdc046bb55609be93e third_party/mimalloc (v2.2.4)
2025-12-04T10:32:23.7393858Z  55f93686c01528224f448c19128836e7df245f72 third_party/nlohmann (v3.12.0)
2025-12-04T10:32:23.7600719Z  e709452ef2bbc1d113faf678c24e6d3467696e83 third_party/onnx (v1.18.0)
2025-12-04T10:32:23.7625023Z  a799f4aed9c94b765dcdaabaeab7d5e7e2310878 third_party/opentelemetry-cpp (v1.14.2)
2025-12-04T10:32:23.7647869Z  0fa0ef591e38c2758e3184c6c23e497b9f732ffa third_party/pocketfft (release_for_eigen-40-g0fa0ef5)
2025-12-04T10:32:23.7864276Z  d1eca4e4b421cd2997495c4b4e65cea6be4e9b8a third_party/protobuf (v3.7.0-rc.2-1279-gd1eca4e4b)
2025-12-04T10:32:23.7904783Z  072586a71b55b7f8c584153d223e95687148a900 third_party/psimd (heads/master)
2025-12-04T10:32:23.7940554Z  4fe0e1e183925bf8cfa6aae24237e724a96479b8 third_party/pthreadpool (0.1-144-g4fe0e1e)
2025-12-04T10:32:23.7964620Z  f5fbe867d2d26e4a0a9177a51f6e568868ad3dc8 third_party/pybind11 (v3.0.1)
2025-12-04T10:32:23.8027602Z  f45429b087dd7d5bc78bb40dc7cf06425c252d67 third_party/python-peachpy (remotes/origin/pre-generated)
2025-12-04T10:32:23.8076100Z  5a1d179df9cf652951b59010a2d2075372d67f68 third_party/sleef (3.8)
2025-12-04T10:32:23.8129202Z  2b4cd91092d335a697416b2a3cb398283246849d third_party/tensorpipe (heads/main)
2025-12-04T10:32:23.8139635Z ##[group]Cleaning the repository
2025-12-04T10:32:23.8143808Z [command]/usr/bin/git clean -ffdx
2025-12-04T10:32:23.8259071Z [command]/usr/bin/git reset --hard HEAD
2025-12-04T10:32:23.9075315Z HEAD is now at ffd9b0fb4355 Resolve collective autotuning test failure on arm (#168919)
2025-12-04T10:32:23.9139844Z ##[endgroup]
2025-12-04T10:32:23.9142347Z ##[group]Disabling automatic garbage collection
2025-12-04T10:32:23.9147226Z [command]/usr/bin/git config --local gc.auto 0
2025-12-04T10:32:23.9173619Z ##[endgroup]
2025-12-04T10:32:23.9173982Z ##[group]Setting up auth
2025-12-04T10:32:23.9176342Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand
2025-12-04T10:32:23.9196035Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :"
2025-12-04T10:32:23.9377333Z Entering 'android/libs/fbjni'
2025-12-04T10:32:23.9406117Z Entering 'third_party/FP16'
2025-12-04T10:32:23.9429190Z Entering 'third_party/FXdiv'
2025-12-04T10:32:23.9457389Z Entering 'third_party/NNPACK'
2025-12-04T10:32:23.9484326Z Entering 'third_party/NVTX'
2025-12-04T10:32:23.9541382Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T10:32:23.9565292Z Entering 'third_party/XNNPACK'
2025-12-04T10:32:23.9594221Z Entering 'third_party/aiter'
2025-12-04T10:32:23.9618399Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T10:32:23.9644246Z Entering 'third_party/benchmark'
2025-12-04T10:32:23.9668079Z Entering 'third_party/composable_kernel'
2025-12-04T10:32:23.9693621Z Entering 'third_party/cpp-httplib'
2025-12-04T10:32:23.9717921Z Entering 'third_party/cpuinfo'
2025-12-04T10:32:23.9740982Z Entering 'third_party/cudnn_frontend'
2025-12-04T10:32:23.9761579Z Entering 'third_party/cutlass'
2025-12-04T10:32:23.9786466Z Entering 'third_party/fbgemm'
2025-12-04T10:32:23.9810213Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T10:32:23.9836706Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T10:32:23.9860286Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T10:32:23.9900596Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T10:32:23.9928025Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T10:32:23.9954665Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T10:32:23.9976947Z Entering 'third_party/fbgemm/external/json'
2025-12-04T10:32:24.0003073Z Entering 'third_party/flash-attention'
2025-12-04T10:32:24.0024302Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T10:32:24.0048717Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T10:32:24.0075798Z Entering 'third_party/flatbuffers'
2025-12-04T10:32:24.0096105Z Entering 'third_party/fmt'
2025-12-04T10:32:24.0126149Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T10:32:24.0155454Z Entering 'third_party/gloo'
2025-12-04T10:32:24.0181673Z Entering 'third_party/googletest'
2025-12-04T10:32:24.0202356Z Entering 'third_party/ideep'
2025-12-04T10:32:24.0222780Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T10:32:24.0247546Z Entering 'third_party/ittapi'
2025-12-04T10:32:24.0268723Z Entering 'third_party/kineto'
2025-12-04T10:32:24.0293345Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T10:32:24.0322755Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T10:32:24.0353166Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T10:32:24.0374547Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T10:32:24.0394467Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T10:32:24.0415357Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T10:32:24.0441594Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T10:32:24.0480392Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T10:32:24.0516770Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T10:32:24.0544107Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T10:32:24.0566849Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T10:32:24.0588876Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T10:32:24.0613301Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T10:32:24.0650640Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T10:32:24.0672088Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T10:32:24.0694457Z Entering 'third_party/kleidiai'
2025-12-04T10:32:24.0717010Z Entering 'third_party/mimalloc'
2025-12-04T10:32:24.0739857Z Entering 'third_party/nlohmann'
2025-12-04T10:32:24.0762332Z Entering 'third_party/onnx'
2025-12-04T10:32:24.0787922Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T10:32:24.0815878Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T10:32:24.0836502Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T10:32:24.0865583Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T10:32:24.0887898Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T10:32:24.0913730Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T10:32:24.0936601Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T10:32:24.0956493Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T10:32:24.0977156Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T10:32:24.0997802Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T10:32:24.1020568Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T10:32:24.1048769Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T10:32:24.1079319Z Entering 'third_party/pocketfft'
2025-12-04T10:32:24.1102060Z Entering 'third_party/protobuf'
2025-12-04T10:32:24.1131403Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T10:32:24.1155171Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T10:32:24.1183725Z Entering 'third_party/psimd'
2025-12-04T10:32:24.1205884Z Entering 'third_party/pthreadpool'
2025-12-04T10:32:24.1228378Z Entering 'third_party/pybind11'
2025-12-04T10:32:24.1250324Z Entering 'third_party/python-peachpy'
2025-12-04T10:32:24.1270443Z Entering 'third_party/sleef'
2025-12-04T10:32:24.1294194Z Entering 'third_party/tensorpipe'
2025-12-04T10:32:24.1320133Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T10:32:24.1343862Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T10:32:24.1363838Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T10:32:24.1382609Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T10:32:24.1403186Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T10:32:24.1444357Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader
2025-12-04T10:32:24.1459771Z http.https://github.com/.extraheader
2025-12-04T10:32:24.1467172Z [command]/usr/bin/git config --local --unset-all http.https://github.com/.extraheader
2025-12-04T10:32:24.1489407Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :"
2025-12-04T10:32:24.1667436Z Entering 'android/libs/fbjni'
2025-12-04T10:32:24.1678557Z http.https://github.com/.extraheader
2025-12-04T10:32:24.1699942Z Entering 'third_party/FP16'
2025-12-04T10:32:24.1713412Z http.https://github.com/.extraheader
2025-12-04T10:32:24.1738183Z Entering 'third_party/FXdiv'
2025-12-04T10:32:24.1751822Z http.https://github.com/.extraheader
2025-12-04T10:32:24.1769030Z Entering 'third_party/NNPACK'
2025-12-04T10:32:24.1782117Z http.https://github.com/.extraheader
2025-12-04T10:32:24.1798841Z Entering 'third_party/NVTX'
2025-12-04T10:32:24.1811609Z http.https://github.com/.extraheader
2025-12-04T10:32:24.1831327Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T10:32:24.1843845Z http.https://github.com/.extraheader
2025-12-04T10:32:24.1861552Z Entering 'third_party/XNNPACK'
2025-12-04T10:32:24.1872997Z http.https://github.com/.extraheader
2025-12-04T10:32:24.1895643Z Entering 'third_party/aiter'
2025-12-04T10:32:24.1907928Z http.https://github.com/.extraheader
2025-12-04T10:32:24.1925334Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T10:32:24.1936831Z http.https://github.com/.extraheader
2025-12-04T10:32:24.1963984Z Entering 'third_party/benchmark'
2025-12-04T10:32:24.1976490Z http.https://github.com/.extraheader
2025-12-04T10:32:24.1995705Z Entering 'third_party/composable_kernel'
2025-12-04T10:32:24.2007823Z http.https://github.com/.extraheader
2025-12-04T10:32:24.2029965Z Entering 'third_party/cpp-httplib'
2025-12-04T10:32:24.2047271Z http.https://github.com/.extraheader
2025-12-04T10:32:24.2073479Z Entering 'third_party/cpuinfo'
2025-12-04T10:32:24.2089704Z http.https://github.com/.extraheader
2025-12-04T10:32:24.2111637Z Entering 'third_party/cudnn_frontend'
2025-12-04T10:32:24.2124746Z http.https://github.com/.extraheader
2025-12-04T10:32:24.2148691Z Entering 'third_party/cutlass'
2025-12-04T10:32:24.2161684Z http.https://github.com/.extraheader
2025-12-04T10:32:24.2181003Z Entering 'third_party/fbgemm'
2025-12-04T10:32:24.2194551Z http.https://github.com/.extraheader
2025-12-04T10:32:24.2211999Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T10:32:24.2224747Z http.https://github.com/.extraheader
2025-12-04T10:32:24.2248257Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T10:32:24.2264803Z http.https://github.com/.extraheader
2025-12-04T10:32:24.2289052Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T10:32:24.2307803Z http.https://github.com/.extraheader
2025-12-04T10:32:24.2335380Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T10:32:24.2361605Z http.https://github.com/.extraheader
2025-12-04T10:32:24.2389910Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T10:32:24.2408468Z http.https://github.com/.extraheader
2025-12-04T10:32:24.2428614Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T10:32:24.2443477Z http.https://github.com/.extraheader
2025-12-04T10:32:24.2465878Z Entering 'third_party/fbgemm/external/json'
2025-12-04T10:32:24.2478633Z http.https://github.com/.extraheader
2025-12-04T10:32:24.2498186Z Entering 'third_party/flash-attention'
2025-12-04T10:32:24.2509846Z http.https://github.com/.extraheader
2025-12-04T10:32:24.2526873Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T10:32:24.2543344Z http.https://github.com/.extraheader
2025-12-04T10:32:24.2565720Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T10:32:24.2578924Z http.https://github.com/.extraheader
2025-12-04T10:32:24.2606046Z Entering 'third_party/flatbuffers'
2025-12-04T10:32:24.2621185Z http.https://github.com/.extraheader
2025-12-04T10:32:24.2640715Z Entering 'third_party/fmt'
2025-12-04T10:32:24.2652379Z http.https://github.com/.extraheader
2025-12-04T10:32:24.2668111Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T10:32:24.2680572Z http.https://github.com/.extraheader
2025-12-04T10:32:24.2696150Z Entering 'third_party/gloo'
2025-12-04T10:32:24.2710089Z http.https://github.com/.extraheader
2025-12-04T10:32:24.2726535Z Entering 'third_party/googletest'
2025-12-04T10:32:24.2739410Z http.https://github.com/.extraheader
2025-12-04T10:32:24.2754668Z Entering 'third_party/ideep'
2025-12-04T10:32:24.2767650Z http.https://github.com/.extraheader
2025-12-04T10:32:24.2785918Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T10:32:24.2799007Z http.https://github.com/.extraheader
2025-12-04T10:32:24.2820387Z Entering 'third_party/ittapi'
2025-12-04T10:32:24.2836988Z http.https://github.com/.extraheader
2025-12-04T10:32:24.2853813Z Entering 'third_party/kineto'
2025-12-04T10:32:24.2867581Z http.https://github.com/.extraheader
2025-12-04T10:32:24.2884354Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T10:32:24.2897464Z http.https://github.com/.extraheader
2025-12-04T10:32:24.2914492Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T10:32:24.2929449Z http.https://github.com/.extraheader
2025-12-04T10:32:24.2945482Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T10:32:24.2956812Z http.https://github.com/.extraheader
2025-12-04T10:32:24.2973145Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T10:32:24.2984362Z http.https://github.com/.extraheader
2025-12-04T10:32:24.3003950Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T10:32:24.3014719Z http.https://github.com/.extraheader
2025-12-04T10:32:24.3030536Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T10:32:24.3046499Z http.https://github.com/.extraheader
2025-12-04T10:32:24.3068310Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T10:32:24.3083142Z http.https://github.com/.extraheader
2025-12-04T10:32:24.3099708Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T10:32:24.3111363Z http.https://github.com/.extraheader
2025-12-04T10:32:24.3127774Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T10:32:24.3138767Z http.https://github.com/.extraheader
2025-12-04T10:32:24.3156006Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T10:32:24.3171451Z http.https://github.com/.extraheader
2025-12-04T10:32:24.3188578Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T10:32:24.3199665Z http.https://github.com/.extraheader
2025-12-04T10:32:24.3214785Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T10:32:24.3228367Z http.https://github.com/.extraheader
2025-12-04T10:32:24.3246114Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T10:32:24.3258905Z http.https://github.com/.extraheader
2025-12-04T10:32:24.3277920Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T10:32:24.3289609Z http.https://github.com/.extraheader
2025-12-04T10:32:24.3304961Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T10:32:24.3317546Z http.https://github.com/.extraheader
2025-12-04T10:32:24.3341004Z Entering 'third_party/kleidiai'
2025-12-04T10:32:24.3353905Z http.https://github.com/.extraheader
2025-12-04T10:32:24.3370788Z Entering 'third_party/mimalloc'
2025-12-04T10:32:24.3385092Z http.https://github.com/.extraheader
2025-12-04T10:32:24.3409851Z Entering 'third_party/nlohmann'
2025-12-04T10:32:24.3423677Z http.https://github.com/.extraheader
2025-12-04T10:32:24.3443514Z Entering 'third_party/onnx'
2025-12-04T10:32:24.3459541Z http.https://github.com/.extraheader
2025-12-04T10:32:24.3482563Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T10:32:24.3509843Z http.https://github.com/.extraheader
2025-12-04T10:32:24.3532975Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T10:32:24.3546366Z http.https://github.com/.extraheader
2025-12-04T10:32:24.3563436Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T10:32:24.3577720Z http.https://github.com/.extraheader
2025-12-04T10:32:24.3594327Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T10:32:24.3607463Z http.https://github.com/.extraheader
2025-12-04T10:32:24.3623034Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T10:32:24.3643724Z http.https://github.com/.extraheader
2025-12-04T10:32:24.3658245Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T10:32:24.3671576Z http.https://github.com/.extraheader
2025-12-04T10:32:24.3689785Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T10:32:24.3702621Z http.https://github.com/.extraheader
2025-12-04T10:32:24.3726497Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T10:32:24.3738900Z http.https://github.com/.extraheader
2025-12-04T10:32:24.3754061Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T10:32:24.3765857Z http.https://github.com/.extraheader
2025-12-04T10:32:24.3784066Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T10:32:24.3796503Z http.https://github.com/.extraheader
2025-12-04T10:32:24.3815374Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T10:32:24.3830664Z http.https://github.com/.extraheader
2025-12-04T10:32:24.3847740Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T10:32:24.3859637Z http.https://github.com/.extraheader
2025-12-04T10:32:24.3888387Z Entering 'third_party/pocketfft'
2025-12-04T10:32:24.3900529Z http.https://github.com/.extraheader
2025-12-04T10:32:24.3917452Z Entering 'third_party/protobuf'
2025-12-04T10:32:24.3933015Z http.https://github.com/.extraheader
2025-12-04T10:32:24.3951183Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T10:32:24.3963601Z http.https://github.com/.extraheader
2025-12-04T10:32:24.3984431Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T10:32:24.3997438Z http.https://github.com/.extraheader
2025-12-04T10:32:24.4018310Z Entering 'third_party/psimd'
2025-12-04T10:32:24.4031457Z http.https://github.com/.extraheader
2025-12-04T10:32:24.4052545Z Entering 'third_party/pthreadpool'
2025-12-04T10:32:24.4065756Z http.https://github.com/.extraheader
2025-12-04T10:32:24.4083801Z Entering 'third_party/pybind11'
2025-12-04T10:32:24.4096701Z http.https://github.com/.extraheader
2025-12-04T10:32:24.4112400Z Entering 'third_party/python-peachpy'
2025-12-04T10:32:24.4125614Z http.https://github.com/.extraheader
2025-12-04T10:32:24.4142820Z Entering 'third_party/sleef'
2025-12-04T10:32:24.4157462Z http.https://github.com/.extraheader
2025-12-04T10:32:24.4178962Z Entering 'third_party/tensorpipe'
2025-12-04T10:32:24.4193235Z http.https://github.com/.extraheader
2025-12-04T10:32:24.4208816Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T10:32:24.4224954Z http.https://github.com/.extraheader
2025-12-04T10:32:24.4240943Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T10:32:24.4259116Z http.https://github.com/.extraheader
2025-12-04T10:32:24.4279533Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T10:32:24.4293285Z http.https://github.com/.extraheader
2025-12-04T10:32:24.4310290Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T10:32:24.4323812Z http.https://github.com/.extraheader
2025-12-04T10:32:24.4340875Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T10:32:24.4352682Z http.https://github.com/.extraheader
2025-12-04T10:32:24.4390996Z [command]/usr/bin/git config --local --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.4408727Z [command]/usr/bin/git submodule foreach --recursive git config --local --show-origin --name-only --get-regexp remote.origin.url
2025-12-04T10:32:24.4558243Z Entering 'android/libs/fbjni'
2025-12-04T10:32:24.4567980Z file:/home/runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config	remote.origin.url
2025-12-04T10:32:24.4576996Z Entering 'third_party/FP16'
2025-12-04T10:32:24.4587169Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config	remote.origin.url
2025-12-04T10:32:24.4595711Z Entering 'third_party/FXdiv'
2025-12-04T10:32:24.4608415Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config	remote.origin.url
2025-12-04T10:32:24.4617136Z Entering 'third_party/NNPACK'
2025-12-04T10:32:24.4630478Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config	remote.origin.url
2025-12-04T10:32:24.4640459Z Entering 'third_party/NVTX'
2025-12-04T10:32:24.4649747Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config	remote.origin.url
2025-12-04T10:32:24.4658348Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T10:32:24.4668764Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config	remote.origin.url
2025-12-04T10:32:24.4677397Z Entering 'third_party/XNNPACK'
2025-12-04T10:32:24.4687540Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config	remote.origin.url
2025-12-04T10:32:24.4701603Z Entering 'third_party/aiter'
2025-12-04T10:32:24.4712114Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config	remote.origin.url
2025-12-04T10:32:24.4726152Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T10:32:24.4735956Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config	remote.origin.url
2025-12-04T10:32:24.4748739Z Entering 'third_party/benchmark'
2025-12-04T10:32:24.4759321Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config	remote.origin.url
2025-12-04T10:32:24.4771997Z Entering 'third_party/composable_kernel'
2025-12-04T10:32:24.4781959Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config	remote.origin.url
2025-12-04T10:32:24.4793702Z Entering 'third_party/cpp-httplib'
2025-12-04T10:32:24.4803979Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config	remote.origin.url
2025-12-04T10:32:24.4812370Z Entering 'third_party/cpuinfo'
2025-12-04T10:32:24.4822196Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config	remote.origin.url
2025-12-04T10:32:24.4831237Z Entering 'third_party/cudnn_frontend'
2025-12-04T10:32:24.4842312Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config	remote.origin.url
2025-12-04T10:32:24.4850854Z Entering 'third_party/cutlass'
2025-12-04T10:32:24.4860330Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config	remote.origin.url
2025-12-04T10:32:24.4872493Z Entering 'third_party/fbgemm'
2025-12-04T10:32:24.4882544Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config	remote.origin.url
2025-12-04T10:32:24.4892565Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T10:32:24.4902401Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config	remote.origin.url
2025-12-04T10:32:24.4910349Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T10:32:24.4919775Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config	remote.origin.url
2025-12-04T10:32:24.4931551Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T10:32:24.4940682Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config	remote.origin.url
2025-12-04T10:32:24.4953404Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T10:32:24.4964229Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config	remote.origin.url
2025-12-04T10:32:24.4982409Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T10:32:24.5001926Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config	remote.origin.url
2025-12-04T10:32:24.5011650Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T10:32:24.5021194Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config	remote.origin.url
2025-12-04T10:32:24.5029692Z Entering 'third_party/fbgemm/external/json'
2025-12-04T10:32:24.5039252Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config	remote.origin.url
2025-12-04T10:32:24.5050278Z Entering 'third_party/flash-attention'
2025-12-04T10:32:24.5061077Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config	remote.origin.url
2025-12-04T10:32:24.5070483Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T10:32:24.5082641Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config	remote.origin.url
2025-12-04T10:32:24.5095116Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T10:32:24.5103796Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config	remote.origin.url
2025-12-04T10:32:24.5116398Z Entering 'third_party/flatbuffers'
2025-12-04T10:32:24.5126766Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config	remote.origin.url
2025-12-04T10:32:24.5136658Z Entering 'third_party/fmt'
2025-12-04T10:32:24.5147411Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config	remote.origin.url
2025-12-04T10:32:24.5168469Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T10:32:24.5179854Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config	remote.origin.url
2025-12-04T10:32:24.5187866Z Entering 'third_party/gloo'
2025-12-04T10:32:24.5198504Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config	remote.origin.url
2025-12-04T10:32:24.5208927Z Entering 'third_party/googletest'
2025-12-04T10:32:24.5219966Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config	remote.origin.url
2025-12-04T10:32:24.5243326Z Entering 'third_party/ideep'
2025-12-04T10:32:24.5243644Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config	remote.origin.url
2025-12-04T10:32:24.5251917Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T10:32:24.5262535Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config	remote.origin.url
2025-12-04T10:32:24.5280127Z Entering 'third_party/ittapi'
2025-12-04T10:32:24.5290760Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config	remote.origin.url
2025-12-04T10:32:24.5299956Z Entering 'third_party/kineto'
2025-12-04T10:32:24.5310870Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config	remote.origin.url
2025-12-04T10:32:24.5323146Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T10:32:24.5335830Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config	remote.origin.url
2025-12-04T10:32:24.5344050Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T10:32:24.5353106Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config	remote.origin.url
2025-12-04T10:32:24.5363110Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T10:32:24.5377907Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config	remote.origin.url
2025-12-04T10:32:24.5388848Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T10:32:24.5407979Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config	remote.origin.url
2025-12-04T10:32:24.5418338Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T10:32:24.5429051Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config	remote.origin.url
2025-12-04T10:32:24.5437188Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T10:32:24.5452377Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config	remote.origin.url
2025-12-04T10:32:24.5466073Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T10:32:24.5474982Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config	remote.origin.url
2025-12-04T10:32:24.5483823Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T10:32:24.5492724Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config	remote.origin.url
2025-12-04T10:32:24.5501576Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T10:32:24.5510910Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config	remote.origin.url
2025-12-04T10:32:24.5519962Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T10:32:24.5531381Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config	remote.origin.url
2025-12-04T10:32:24.5538548Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T10:32:24.5547241Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config	remote.origin.url
2025-12-04T10:32:24.5555539Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T10:32:24.5564194Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config	remote.origin.url
2025-12-04T10:32:24.5574625Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T10:32:24.5584130Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config	remote.origin.url
2025-12-04T10:32:24.5596276Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T10:32:24.5605104Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config	remote.origin.url
2025-12-04T10:32:24.5613554Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T10:32:24.5624181Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config	remote.origin.url
2025-12-04T10:32:24.5636243Z Entering 'third_party/kleidiai'
2025-12-04T10:32:24.5646871Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config	remote.origin.url
2025-12-04T10:32:24.5658831Z Entering 'third_party/mimalloc'
2025-12-04T10:32:24.5668626Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config	remote.origin.url
2025-12-04T10:32:24.5677837Z Entering 'third_party/nlohmann'
2025-12-04T10:32:24.5693741Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config	remote.origin.url
2025-12-04T10:32:24.5702963Z Entering 'third_party/onnx'
2025-12-04T10:32:24.5712069Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config	remote.origin.url
2025-12-04T10:32:24.5726402Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T10:32:24.5735938Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config	remote.origin.url
2025-12-04T10:32:24.5747392Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T10:32:24.5763062Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config	remote.origin.url
2025-12-04T10:32:24.5770491Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T10:32:24.5780864Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config	remote.origin.url
2025-12-04T10:32:24.5789380Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T10:32:24.5799022Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config	remote.origin.url
2025-12-04T10:32:24.5807415Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T10:32:24.5815902Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config	remote.origin.url
2025-12-04T10:32:24.5823569Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T10:32:24.5835689Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config	remote.origin.url
2025-12-04T10:32:24.5845498Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T10:32:24.5855238Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config	remote.origin.url
2025-12-04T10:32:24.5865245Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T10:32:24.5874888Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config	remote.origin.url
2025-12-04T10:32:24.5883661Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T10:32:24.5894239Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config	remote.origin.url
2025-12-04T10:32:24.5902831Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T10:32:24.5912837Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config	remote.origin.url
2025-12-04T10:32:24.5921885Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T10:32:24.5930831Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config	remote.origin.url
2025-12-04T10:32:24.5941721Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T10:32:24.5950540Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config	remote.origin.url
2025-12-04T10:32:24.5969348Z Entering 'third_party/pocketfft'
2025-12-04T10:32:24.5979465Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config	remote.origin.url
2025-12-04T10:32:24.5991165Z Entering 'third_party/protobuf'
2025-12-04T10:32:24.6001391Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config	remote.origin.url
2025-12-04T10:32:24.6011592Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T10:32:24.6022125Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config	remote.origin.url
2025-12-04T10:32:24.6031348Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T10:32:24.6040254Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config	remote.origin.url
2025-12-04T10:32:24.6051090Z Entering 'third_party/psimd'
2025-12-04T10:32:24.6060282Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config	remote.origin.url
2025-12-04T10:32:24.6069445Z Entering 'third_party/pthreadpool'
2025-12-04T10:32:24.6081753Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config	remote.origin.url
2025-12-04T10:32:24.6090847Z Entering 'third_party/pybind11'
2025-12-04T10:32:24.6100939Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config	remote.origin.url
2025-12-04T10:32:24.6117967Z Entering 'third_party/python-peachpy'
2025-12-04T10:32:24.6128384Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config	remote.origin.url
2025-12-04T10:32:24.6137041Z Entering 'third_party/sleef'
2025-12-04T10:32:24.6146601Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config	remote.origin.url
2025-12-04T10:32:24.6155022Z Entering 'third_party/tensorpipe'
2025-12-04T10:32:24.6165280Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config	remote.origin.url
2025-12-04T10:32:24.6174341Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T10:32:24.6184456Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config	remote.origin.url
2025-12-04T10:32:24.6194826Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T10:32:24.6203861Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config	remote.origin.url
2025-12-04T10:32:24.6211912Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T10:32:24.6222813Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config	remote.origin.url
2025-12-04T10:32:24.6231597Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T10:32:24.6240781Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config	remote.origin.url
2025-12-04T10:32:24.6251646Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T10:32:24.6261022Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config	remote.origin.url
2025-12-04T10:32:24.6287170Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.6303336Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.6318137Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.6331181Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.6347782Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.6363567Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.6377917Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.6395016Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.6408200Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.6421434Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.6434053Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.6453939Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.6471153Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.6484169Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.6498294Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.6512431Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.6525553Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.6539957Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.6558220Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.6571205Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.6584175Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.6596637Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.6609809Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.6622458Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.6635884Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.6649290Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.6662913Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.6675835Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.6689096Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.6701848Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.6714676Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.6727922Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.6741049Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.6755067Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.6770436Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.6788264Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.6801733Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.6815905Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.6829976Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.6843137Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.6856980Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.6870681Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.6891295Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.6904399Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.6917607Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.6930588Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.6944120Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.6956967Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.6970766Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.6983915Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.7002898Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.7016116Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.7029424Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.7042941Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.7056415Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.7070334Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.7088379Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.7102703Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.7116515Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.7131278Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.7144673Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.7158565Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.7172479Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.7187808Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.7200769Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.7214409Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.7228237Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.7241979Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.7256147Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.7276520Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.7290360Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.7303031Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.7316694Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.7330587Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.7343870Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.7358607Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.7372462Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.7386615Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.7400970Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.7414840Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.7428492Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T10:32:24.7444859Z [command]/usr/bin/git config --local http.https://github.com/.extraheader AUTHORIZATION: basic ***
2025-12-04T10:32:24.7465419Z ##[endgroup]
2025-12-04T10:32:24.7465627Z ##[group]Fetching the repository
2025-12-04T10:32:24.7469325Z [command]/usr/bin/git -c protocol.version=2 fetch --prune --no-recurse-submodules origin +refs/heads/*:refs/remotes/origin/* +refs/tags/*:refs/tags/*
2025-12-04T10:32:26.1603220Z [command]/usr/bin/git rev-parse --verify --quiet ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32^{object}
2025-12-04T10:32:26.1779457Z ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T10:32:26.1783622Z ##[endgroup]
2025-12-04T10:32:26.1783931Z ##[group]Determining the checkout info
2025-12-04T10:32:26.1785513Z ##[endgroup]
2025-12-04T10:32:26.1790841Z [command]/usr/bin/git sparse-checkout disable
2025-12-04T10:32:26.1884496Z [command]/usr/bin/git config --local --unset-all extensions.worktreeConfig
2025-12-04T10:32:26.1906406Z ##[group]Checking out the ref
2025-12-04T10:32:26.1908242Z [command]/usr/bin/git checkout --progress --force ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T10:32:26.2212218Z HEAD is now at ffd9b0fb4355 Resolve collective autotuning test failure on arm (#168919)
2025-12-04T10:32:26.2217067Z ##[endgroup]
2025-12-04T10:32:26.2217305Z ##[group]Setting up auth for fetching submodules
2025-12-04T10:32:26.2220817Z [command]/usr/bin/git config --global http.https://github.com/.extraheader AUTHORIZATION: basic ***
2025-12-04T10:32:26.2250995Z [command]/usr/bin/git config --global --unset-all url.https://github.com/.insteadOf
2025-12-04T10:32:26.2269115Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf git@github.com:
2025-12-04T10:32:26.2285435Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf org-21003710@github.com:
2025-12-04T10:32:26.2299881Z ##[endgroup]
2025-12-04T10:32:26.2300074Z ##[group]Fetching submodules
2025-12-04T10:32:26.2301774Z [command]/usr/bin/git submodule sync --recursive
2025-12-04T10:32:26.2486686Z Synchronizing submodule url for 'android/libs/fbjni'
2025-12-04T10:32:26.2496711Z Synchronizing submodule url for 'third_party/FP16'
2025-12-04T10:32:26.2511387Z Synchronizing submodule url for 'third_party/FXdiv'
2025-12-04T10:32:26.2522763Z Synchronizing submodule url for 'third_party/NNPACK'
2025-12-04T10:32:26.2534069Z Synchronizing submodule url for 'third_party/NVTX'
2025-12-04T10:32:26.2545676Z Synchronizing submodule url for 'third_party/VulkanMemoryAllocator'
2025-12-04T10:32:26.2557524Z Synchronizing submodule url for 'third_party/XNNPACK'
2025-12-04T10:32:26.2577908Z Synchronizing submodule url for 'third_party/aiter'
2025-12-04T10:32:26.2590917Z Synchronizing submodule url for 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T10:32:26.2605313Z Synchronizing submodule url for 'third_party/benchmark'
2025-12-04T10:32:26.2615196Z Synchronizing submodule url for 'third_party/composable_kernel'
2025-12-04T10:32:26.2627862Z Synchronizing submodule url for 'third_party/cpp-httplib'
2025-12-04T10:32:26.2639405Z Synchronizing submodule url for 'third_party/cpuinfo'
2025-12-04T10:32:26.2650048Z Synchronizing submodule url for 'third_party/cudnn_frontend'
2025-12-04T10:32:26.2660695Z Synchronizing submodule url for 'third_party/cutlass'
2025-12-04T10:32:26.2681068Z Synchronizing submodule url for 'third_party/fbgemm'
2025-12-04T10:32:26.2697715Z Synchronizing submodule url for 'third_party/fbgemm/external/asmjit'
2025-12-04T10:32:26.2709001Z Synchronizing submodule url for 'third_party/fbgemm/external/composable_kernel'
2025-12-04T10:32:26.2723022Z Synchronizing submodule url for 'third_party/fbgemm/external/cpuinfo'
2025-12-04T10:32:26.2734587Z Synchronizing submodule url for 'third_party/fbgemm/external/cutlass'
2025-12-04T10:32:26.2748334Z Synchronizing submodule url for 'third_party/fbgemm/external/googletest'
2025-12-04T10:32:26.2757301Z Synchronizing submodule url for 'third_party/fbgemm/external/hipify_torch'
2025-12-04T10:32:26.2767566Z Synchronizing submodule url for 'third_party/fbgemm/external/json'
2025-12-04T10:32:26.2780037Z Synchronizing submodule url for 'third_party/flash-attention'
2025-12-04T10:32:26.2790567Z Synchronizing submodule url for 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T10:32:26.2809468Z Synchronizing submodule url for 'third_party/flash-attention/csrc/cutlass'
2025-12-04T10:32:26.2825657Z Synchronizing submodule url for 'third_party/flatbuffers'
2025-12-04T10:32:26.2837059Z Synchronizing submodule url for 'third_party/fmt'
2025-12-04T10:32:26.2847810Z Synchronizing submodule url for 'third_party/gemmlowp/gemmlowp'
2025-12-04T10:32:26.2858075Z Synchronizing submodule url for 'third_party/gloo'
2025-12-04T10:32:26.2869054Z Synchronizing submodule url for 'third_party/googletest'
2025-12-04T10:32:26.2880278Z Synchronizing submodule url for 'third_party/ideep'
2025-12-04T10:32:26.2890780Z Synchronizing submodule url for 'third_party/ideep/mkl-dnn'
2025-12-04T10:32:26.2908516Z Synchronizing submodule url for 'third_party/ittapi'
2025-12-04T10:32:26.2919170Z Synchronizing submodule url for 'third_party/kineto'
2025-12-04T10:32:26.2930429Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T10:32:26.2940794Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T10:32:26.2951775Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T10:32:26.2962618Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T10:32:26.2973710Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T10:32:26.2982971Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T10:32:26.3000482Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T10:32:26.3012370Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T10:32:26.3021780Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T10:32:26.3031290Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T10:32:26.3040472Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T10:32:26.3051347Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T10:32:26.3062219Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T10:32:26.3077348Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T10:32:26.3087137Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T10:32:26.3099628Z Synchronizing submodule url for 'third_party/kleidiai'
2025-12-04T10:32:26.3110624Z Synchronizing submodule url for 'third_party/mimalloc'
2025-12-04T10:32:26.3121057Z Synchronizing submodule url for 'third_party/nlohmann'
2025-12-04T10:32:26.3131737Z Synchronizing submodule url for 'third_party/onnx'
2025-12-04T10:32:26.3148195Z Synchronizing submodule url for 'third_party/onnx/third_party/pybind11'
2025-12-04T10:32:26.3171243Z Synchronizing submodule url for 'third_party/opentelemetry-cpp'
2025-12-04T10:32:26.3193741Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T10:32:26.3209783Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T10:32:26.3227032Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T10:32:26.3241004Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T10:32:26.3250321Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T10:32:26.3266078Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T10:32:26.3278064Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T10:32:26.3290632Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T10:32:26.3300530Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T10:32:26.3313500Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T10:32:26.3338629Z Synchronizing submodule url for 'third_party/pocketfft'
2025-12-04T10:32:26.3349246Z Synchronizing submodule url for 'third_party/protobuf'
2025-12-04T10:32:26.3367201Z Synchronizing submodule url for 'third_party/protobuf/third_party/benchmark'
2025-12-04T10:32:26.3381548Z Synchronizing submodule url for 'third_party/protobuf/third_party/googletest'
2025-12-04T10:32:26.3396639Z Synchronizing submodule url for 'third_party/psimd'
2025-12-04T10:32:26.3408224Z Synchronizing submodule url for 'third_party/pthreadpool'
2025-12-04T10:32:26.3419082Z Synchronizing submodule url for 'third_party/pybind11'
2025-12-04T10:32:26.3428196Z Synchronizing submodule url for 'third_party/python-peachpy'
2025-12-04T10:32:26.3437376Z Synchronizing submodule url for 'third_party/sleef'
2025-12-04T10:32:26.3446555Z Synchronizing submodule url for 'third_party/tensorpipe'
2025-12-04T10:32:26.3456830Z Synchronizing submodule url for 'third_party/tensorpipe/third_party/googletest'
2025-12-04T10:32:26.3466361Z Synchronizing submodule url for 'third_party/tensorpipe/third_party/libnop'
2025-12-04T10:32:26.3477082Z Synchronizing submodule url for 'third_party/tensorpipe/third_party/libuv'
2025-12-04T10:32:26.3487665Z Synchronizing submodule url for 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T10:32:26.3498014Z Synchronizing submodule url for 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T10:32:26.3530803Z [command]/usr/bin/git -c protocol.version=2 submodule update --init --force --recursive
2025-12-04T10:32:26.3755316Z Submodule path 'android/libs/fbjni': checked out '7e1e1fe3858c63c251c637ae41a20de425dde96f'
2025-12-04T10:32:26.3810489Z Submodule path 'third_party/FP16': checked out '4dfe081cf6bcd15db339cf2680b9281b8451eeb3'
2025-12-04T10:32:26.3864276Z Submodule path 'third_party/FXdiv': checked out 'b408327ac2a15ec3e43352421954f5b1967701d1'
2025-12-04T10:32:26.3910168Z Submodule path 'third_party/NNPACK': checked out 'c07e3a0400713d546e0dea2d5466dd22ea389c73'
2025-12-04T10:32:26.3981926Z Submodule path 'third_party/NVTX': checked out '3ebbc93ded7285963bff932c678fa367eb393ba6'
2025-12-04T10:32:26.4042170Z Submodule path 'third_party/VulkanMemoryAllocator': checked out '1d8f600fd424278486eade7ed3e877c99f0846b1'
2025-12-04T10:32:26.4190301Z Submodule path 'third_party/XNNPACK': checked out '51a0103656eff6fc9bfd39a4597923c4b542c883'
2025-12-04T10:32:26.4336640Z Submodule path 'third_party/aiter': checked out '01aae101b9e5e94d6c16a9514c9fb8df99c93150'
2025-12-04T10:32:26.4513292Z Submodule path 'third_party/aiter/3rdparty/composable_kernel': checked out 'cffe8fa2a442ac8e80dd236a1a5d24fe3d7e0cbf'
2025-12-04T10:32:26.4568580Z Submodule path 'third_party/benchmark': checked out '299e5928955cc62af9968370293b916f5130916f'
2025-12-04T10:32:26.4735642Z Submodule path 'third_party/composable_kernel': checked out '7fe50dc3da2069d6645d9deb8c017a876472a977'
2025-12-04T10:32:26.4809817Z Submodule path 'third_party/cpp-httplib': checked out '89c932f313c6437c38f2982869beacc89c2f2246'
2025-12-04T10:32:26.4877470Z Submodule path 'third_party/cpuinfo': checked out 'f858c30bcb16f8effd5ff46996f0514539e17abc'
2025-12-04T10:32:26.4944603Z Submodule path 'third_party/cudnn_frontend': checked out '0b1577c8c83401237d601d0d0db5210506705396'
2025-12-04T10:32:26.5055147Z Submodule path 'third_party/cutlass': checked out 'f88806b1e31dfa579842638740216dd41fc6c588'
2025-12-04T10:32:26.5170096Z Submodule path 'third_party/fbgemm': checked out 'c0b988d39a9e47c794d699f29930ed4d7c7e13a4'
2025-12-04T10:32:26.5220382Z Submodule path 'third_party/fbgemm/external/asmjit': checked out 'a3199e8857792cd10b7589ff5d58343d2c9008ea'
2025-12-04T10:32:26.5429939Z Submodule path 'third_party/fbgemm/external/composable_kernel': checked out '7fe50dc3da2069d6645d9deb8c017a876472a977'
2025-12-04T10:32:26.5516666Z Submodule path 'third_party/fbgemm/external/cpuinfo': checked out '6543fec09b2f04ac4a666882998b534afc9c1349'
2025-12-04T10:32:26.5624630Z Submodule path 'third_party/fbgemm/external/cutlass': checked out '98125ce499b0fdf7ffbe0e3052f5b8709f4840f8'
2025-12-04T10:32:26.5679567Z Submodule path 'third_party/fbgemm/external/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723'
2025-12-04T10:32:26.5727727Z Submodule path 'third_party/fbgemm/external/hipify_torch': checked out '63b6a7b541fa7f08f8475ca7d74054db36ff2691'
2025-12-04T10:32:26.5809888Z Submodule path 'third_party/fbgemm/external/json': checked out '9cca280a4d0ccf0c08f47a99aa71d1b0e52f8d03'
2025-12-04T10:32:26.5886160Z Submodule path 'third_party/flash-attention': checked out '979702c87a8713a8e0a5e9fee122b90d2ef13be5'
2025-12-04T10:32:26.6042366Z Submodule path 'third_party/flash-attention/csrc/composable_kernel': checked out '888317e698e9803c62bd38568abc9e05d7709f33'
2025-12-04T10:32:26.6144740Z Submodule path 'third_party/flash-attention/csrc/cutlass': checked out 'c506e16788cb08416a4a57e11a9067beeee29420'
2025-12-04T10:32:26.6229548Z Submodule path 'third_party/flatbuffers': checked out 'a2cd1ea3b6d3fee220106b5fed3f7ce8da9eb757'
2025-12-04T10:32:26.6283820Z Submodule path 'third_party/fmt': checked out '407c905e45ad75fc29bf0f9bb7c5c2fd3475976f'
2025-12-04T10:32:26.6333869Z Submodule path 'third_party/gemmlowp/gemmlowp': checked out '3fb5c176c17c765a3492cd2f0321b0dab712f350'
2025-12-04T10:32:26.6400944Z Submodule path 'third_party/gloo': checked out '54cbae0d3a67fa890b4c3d9ee162b7860315e341'
2025-12-04T10:32:26.6453641Z Submodule path 'third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723'
2025-12-04T10:32:26.6504976Z Submodule path 'third_party/ideep': checked out '719d8e6cd7f7a0e01b155657526d693acf97c2b3'
2025-12-04T10:32:26.6670669Z Submodule path 'third_party/ideep/mkl-dnn': checked out '8d263e693366ef8db40acc569cc7d8edf644556d'
2025-12-04T10:32:26.6726817Z Submodule path 'third_party/ittapi': checked out 'dec1d23ca65ab069d225dfe40dea14f455170959'
2025-12-04T10:32:26.6810458Z Submodule path 'third_party/kineto': checked out '31f85df8fbd89c188f14ef10f1ec65379786b943'
2025-12-04T10:32:26.6889238Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog': checked out 'd2ffe0a4e3acace628db49974246b66fc3e85fb1'
2025-12-04T10:32:26.6956352Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM': checked out 'ffde4e54bc7249a6039a5e6b45b395141e1217f9'
2025-12-04T10:32:26.7021250Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr': checked out '871ed52d350214a034f6ef8a3b8f51c5ce1bd400'
2025-12-04T10:32:26.7081158Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt': checked out 'cd4af11efc9c622896a3e4cb599fa28668ca3d05'
2025-12-04T10:32:26.7133946Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags': checked out 'e171aa2d15ed9eb17054558e0b3a6a413bb01067'
2025-12-04T10:32:26.7181653Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc': checked out '8411df715cf522606e3b1aca386ddfc0b63d34b4'
2025-12-04T10:32:26.7247228Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog': checked out 'b33e3bad4c46c8a6345525fd822af355e5ef9446'
2025-12-04T10:32:26.7314360Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723'
2025-12-04T10:32:26.7397121Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/json': checked out '4f8fba14066156b73f1189a2b8bd568bde5284c5'
2025-12-04T10:32:26.7447622Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs': checked out 'f68a2fa8ea36c783bdd760371411fcb495aa3150'
2025-12-04T10:32:26.7542382Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp': checked out 'b1234816facfdda29845c46696a02998a4af115a'
2025-12-04T10:32:26.7637139Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb': checked out 'd7ba35bbb649209c66e582d5a0244ba988a15159'
2025-12-04T10:32:26.7710644Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest': checked out 'e2239ee6043f73722e7aa812a459f54a28552929'
2025-12-04T10:32:26.7773637Z Submodule path 'third_party/kineto/libkineto/third_party/fmt': checked out '40626af88bd7df9a5fb80be7b25ac85b122d6c21'
2025-12-04T10:32:26.7827087Z Submodule path 'third_party/kineto/libkineto/third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723'
2025-12-04T10:32:26.7909364Z Submodule path 'third_party/kleidiai': checked out 'd7770c89632329a9914ef1a90289917597639cbe'
2025-12-04T10:32:26.7982580Z Submodule path 'third_party/mimalloc': checked out 'fbd8b99c2b828428947d70fdc046bb55609be93e'
2025-12-04T10:32:26.8069036Z Submodule path 'third_party/nlohmann': checked out '55f93686c01528224f448c19128836e7df245f72'
2025-12-04T10:32:26.8216737Z Submodule path 'third_party/onnx': checked out 'e709452ef2bbc1d113faf678c24e6d3467696e83'
2025-12-04T10:32:26.8291946Z Submodule path 'third_party/onnx/third_party/pybind11': checked out 'a2e59f0e7065404b44dfe92a28aca47ba1378dc4'
2025-12-04T10:32:26.8390084Z Submodule path 'third_party/opentelemetry-cpp': checked out 'a799f4aed9c94b765dcdaabaeab7d5e7e2310878'
2025-12-04T10:32:26.8453183Z Submodule path 'third_party/opentelemetry-cpp/third_party/benchmark': checked out 'd572f4777349d43653b21d6c2fc63020ab326db2'
2025-12-04T10:32:26.8511951Z Submodule path 'third_party/opentelemetry-cpp/third_party/googletest': checked out 'b796f7d44681514f58a683a3a71ff17c94edb0c1'
2025-12-04T10:32:26.8572420Z Submodule path 'third_party/opentelemetry-cpp/third_party/ms-gsl': checked out '6f4529395c5b7c2d661812257cd6780c67e54afa'
2025-12-04T10:32:26.8661695Z Submodule path 'third_party/opentelemetry-cpp/third_party/nlohmann-json': checked out 'bc889afb4c5bf1c0d8ee29ef35eaaf4c8bef8a5d'
2025-12-04T10:32:26.8714063Z Submodule path 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto': checked out '4ca4f0335c63cda7ab31ea7ed70d6553aee14dce'
2025-12-04T10:32:26.8766239Z Submodule path 'third_party/opentelemetry-cpp/third_party/opentracing-cpp': checked out '06b57f48ded1fa3bdd3d4346f6ef29e40e08eaf5'
2025-12-04T10:32:26.8830732Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp': checked out 'c9ffcdda9086ffd9e1283ea7a0276d831f3c8a8d'
2025-12-04T10:32:26.8899620Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb': checked out 'eefb26f82b233268fc98577d265352720d477ba4'
2025-12-04T10:32:26.8971682Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest': checked out 'e2239ee6043f73722e7aa812a459f54a28552929'
2025-12-04T10:32:26.9127935Z Submodule path 'third_party/opentelemetry-cpp/tools/vcpkg': checked out '8eb57355a4ffb410a2e94c07b4dca2dffbee8e50'
2025-12-04T10:32:26.9189418Z Submodule path 'third_party/pocketfft': checked out '0fa0ef591e38c2758e3184c6c23e497b9f732ffa'
2025-12-04T10:32:26.9348013Z Submodule path 'third_party/protobuf': checked out 'd1eca4e4b421cd2997495c4b4e65cea6be4e9b8a'
2025-12-04T10:32:26.9401380Z Submodule path 'third_party/protobuf/third_party/benchmark': checked out '5b7683f49e1e9223cf9927b24f6fd3d6bd82e3f8'
2025-12-04T10:32:26.9465905Z Submodule path 'third_party/protobuf/third_party/googletest': checked out '5ec7f0c4a113e2f18ac2c6cc7df51ad6afc24081'
2025-12-04T10:32:26.9518223Z Submodule path 'third_party/psimd': checked out '072586a71b55b7f8c584153d223e95687148a900'
2025-12-04T10:32:26.9569017Z Submodule path 'third_party/pthreadpool': checked out '4fe0e1e183925bf8cfa6aae24237e724a96479b8'
2025-12-04T10:32:26.9629932Z Submodule path 'third_party/pybind11': checked out 'f5fbe867d2d26e4a0a9177a51f6e568868ad3dc8'
2025-12-04T10:32:26.9677998Z Submodule path 'third_party/python-peachpy': checked out 'f45429b087dd7d5bc78bb40dc7cf06425c252d67'
2025-12-04T10:32:26.9732443Z Submodule path 'third_party/sleef': checked out '5a1d179df9cf652951b59010a2d2075372d67f68'
2025-12-04T10:32:26.9792691Z Submodule path 'third_party/tensorpipe': checked out '2b4cd91092d335a697416b2a3cb398283246849d'
2025-12-04T10:32:26.9846967Z Submodule path 'third_party/tensorpipe/third_party/googletest': checked out 'aee0f9d9b5b87796ee8a0ab26b7587ec30e8858e'
2025-12-04T10:32:26.9891855Z Submodule path 'third_party/tensorpipe/third_party/libnop': checked out '910b55815be16109f04f4180e9adee14fb4ce281'
2025-12-04T10:32:27.0021355Z Submodule path 'third_party/tensorpipe/third_party/libuv': checked out '5152db2cbfeb5582e9c27c5ea1dba2cd9e10759b'
2025-12-04T10:32:27.0084390Z Submodule path 'third_party/tensorpipe/third_party/pybind11': checked out 'a23996fce38ff6ccfbcdc09f1e63f2c4be5ea2ef'
2025-12-04T10:32:27.0131164Z Submodule path 'third_party/tensorpipe/third_party/pybind11/tools/clang': checked out '6a00cbc4a9b8e68b71caf7f774b3f9c753ae84d5'
2025-12-04T10:32:27.0154137Z [command]/usr/bin/git submodule foreach --recursive git config --local gc.auto 0
2025-12-04T10:32:27.0362198Z Entering 'android/libs/fbjni'
2025-12-04T10:32:27.0384516Z Entering 'third_party/FP16'
2025-12-04T10:32:27.0410159Z Entering 'third_party/FXdiv'
2025-12-04T10:32:27.0431784Z Entering 'third_party/NNPACK'
2025-12-04T10:32:27.0450677Z Entering 'third_party/NVTX'
2025-12-04T10:32:27.0470299Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T10:32:27.0491033Z Entering 'third_party/XNNPACK'
2025-12-04T10:32:27.0523585Z Entering 'third_party/aiter'
2025-12-04T10:32:27.0544176Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T10:32:27.0566373Z Entering 'third_party/benchmark'
2025-12-04T10:32:27.0586040Z Entering 'third_party/composable_kernel'
2025-12-04T10:32:27.0606952Z Entering 'third_party/cpp-httplib'
2025-12-04T10:32:27.0628756Z Entering 'third_party/cpuinfo'
2025-12-04T10:32:27.0647258Z Entering 'third_party/cudnn_frontend'
2025-12-04T10:32:27.0667145Z Entering 'third_party/cutlass'
2025-12-04T10:32:27.0691905Z Entering 'third_party/fbgemm'
2025-12-04T10:32:27.0713249Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T10:32:27.0733768Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T10:32:27.0758425Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T10:32:27.0777461Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T10:32:27.0799973Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T10:32:27.0818731Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T10:32:27.0837396Z Entering 'third_party/fbgemm/external/json'
2025-12-04T10:32:27.0858458Z Entering 'third_party/flash-attention'
2025-12-04T10:32:27.0885501Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T10:32:27.0908669Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T10:32:27.0938889Z Entering 'third_party/flatbuffers'
2025-12-04T10:32:27.0958772Z Entering 'third_party/fmt'
2025-12-04T10:32:27.0977948Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T10:32:27.1004266Z Entering 'third_party/gloo'
2025-12-04T10:32:27.1024004Z Entering 'third_party/googletest'
2025-12-04T10:32:27.1048412Z Entering 'third_party/ideep'
2025-12-04T10:32:27.1067641Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T10:32:27.1091365Z Entering 'third_party/ittapi'
2025-12-04T10:32:27.1110729Z Entering 'third_party/kineto'
2025-12-04T10:32:27.1130229Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T10:32:27.1148003Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T10:32:27.1178812Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T10:32:27.1201673Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T10:32:27.1227713Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T10:32:27.1248328Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T10:32:27.1269725Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T10:32:27.1288489Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T10:32:27.1307534Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T10:32:27.1327438Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T10:32:27.1345338Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T10:32:27.1362692Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T10:32:27.1381965Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T10:32:27.1405070Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T10:32:27.1428818Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T10:32:27.1449171Z Entering 'third_party/kleidiai'
2025-12-04T10:32:27.1467897Z Entering 'third_party/mimalloc'
2025-12-04T10:32:27.1496132Z Entering 'third_party/nlohmann'
2025-12-04T10:32:27.1515405Z Entering 'third_party/onnx'
2025-12-04T10:32:27.1552313Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T10:32:27.1574975Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T10:32:27.1595725Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T10:32:27.1615794Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T10:32:27.1643435Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T10:32:27.1663984Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T10:32:27.1683047Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T10:32:27.1701023Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T10:32:27.1720535Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T10:32:27.1745580Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T10:32:27.1765209Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T10:32:27.1784824Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T10:32:27.1811499Z Entering 'third_party/pocketfft'
2025-12-04T10:32:27.1835427Z Entering 'third_party/protobuf'
2025-12-04T10:32:27.1856602Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T10:32:27.1874806Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T10:32:27.1899825Z Entering 'third_party/psimd'
2025-12-04T10:32:27.1925956Z Entering 'third_party/pthreadpool'
2025-12-04T10:32:27.1945588Z Entering 'third_party/pybind11'
2025-12-04T10:32:27.1970166Z Entering 'third_party/python-peachpy'
2025-12-04T10:32:27.1990610Z Entering 'third_party/sleef'
2025-12-04T10:32:27.2026091Z Entering 'third_party/tensorpipe'
2025-12-04T10:32:27.2060621Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T10:32:27.2093875Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T10:32:27.2121323Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T10:32:27.2152404Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T10:32:27.2176628Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T10:32:27.2215818Z ##[endgroup]
2025-12-04T10:32:27.2216125Z ##[group]Persisting credentials for submodules
2025-12-04T10:32:27.2224450Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'url\.https\:\/\/github\.com\/\.insteadOf' && git config --local --unset-all 'url.https://github.com/.insteadOf' || :"
2025-12-04T10:32:27.2426847Z Entering 'android/libs/fbjni'
2025-12-04T10:32:27.2442517Z url.https://github.com/.insteadof
2025-12-04T10:32:27.2443023Z url.https://github.com/.insteadof
2025-12-04T10:32:27.2459878Z Entering 'third_party/FP16'
2025-12-04T10:32:27.2475276Z url.https://github.com/.insteadof
2025-12-04T10:32:27.2475562Z url.https://github.com/.insteadof
2025-12-04T10:32:27.2499631Z Entering 'third_party/FXdiv'
2025-12-04T10:32:27.2517741Z url.https://github.com/.insteadof
2025-12-04T10:32:27.2517923Z url.https://github.com/.insteadof
2025-12-04T10:32:27.2536349Z Entering 'third_party/NNPACK'
2025-12-04T10:32:27.2553822Z url.https://github.com/.insteadof
2025-12-04T10:32:27.2553980Z url.https://github.com/.insteadof
2025-12-04T10:32:27.2571669Z Entering 'third_party/NVTX'
2025-12-04T10:32:27.2584778Z url.https://github.com/.insteadof
2025-12-04T10:32:27.2584952Z url.https://github.com/.insteadof
2025-12-04T10:32:27.2601846Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T10:32:27.2616395Z url.https://github.com/.insteadof
2025-12-04T10:32:27.2616695Z url.https://github.com/.insteadof
2025-12-04T10:32:27.2637338Z Entering 'third_party/XNNPACK'
2025-12-04T10:32:27.2651710Z url.https://github.com/.insteadof
2025-12-04T10:32:27.2651915Z url.https://github.com/.insteadof
2025-12-04T10:32:27.2680469Z Entering 'third_party/aiter'
2025-12-04T10:32:27.2694447Z url.https://github.com/.insteadof
2025-12-04T10:32:27.2694632Z url.https://github.com/.insteadof
2025-12-04T10:32:27.2715022Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T10:32:27.2731521Z url.https://github.com/.insteadof
2025-12-04T10:32:27.2731694Z url.https://github.com/.insteadof
2025-12-04T10:32:27.2754611Z Entering 'third_party/benchmark'
2025-12-04T10:32:27.2771728Z url.https://github.com/.insteadof
2025-12-04T10:32:27.2771885Z url.https://github.com/.insteadof
2025-12-04T10:32:27.2794082Z Entering 'third_party/composable_kernel'
2025-12-04T10:32:27.2808459Z url.https://github.com/.insteadof
2025-12-04T10:32:27.2808607Z url.https://github.com/.insteadof
2025-12-04T10:32:27.2830835Z Entering 'third_party/cpp-httplib'
2025-12-04T10:32:27.2844619Z url.https://github.com/.insteadof
2025-12-04T10:32:27.2844750Z url.https://github.com/.insteadof
2025-12-04T10:32:27.2861965Z Entering 'third_party/cpuinfo'
2025-12-04T10:32:27.2875147Z url.https://github.com/.insteadof
2025-12-04T10:32:27.2875507Z url.https://github.com/.insteadof
2025-12-04T10:32:27.2895855Z Entering 'third_party/cudnn_frontend'
2025-12-04T10:32:27.2911896Z url.https://github.com/.insteadof
2025-12-04T10:32:27.2912033Z url.https://github.com/.insteadof
2025-12-04T10:32:27.2930270Z Entering 'third_party/cutlass'
2025-12-04T10:32:27.2942441Z url.https://github.com/.insteadof
2025-12-04T10:32:27.2942721Z url.https://github.com/.insteadof
2025-12-04T10:32:27.2971574Z Entering 'third_party/fbgemm'
2025-12-04T10:32:27.2986732Z url.https://github.com/.insteadof
2025-12-04T10:32:27.2986859Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3012947Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T10:32:27.3029695Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3029881Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3057860Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T10:32:27.3070673Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3070813Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3094932Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T10:32:27.3108598Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3108722Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3127931Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T10:32:27.3141318Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3141568Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3160726Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T10:32:27.3172726Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3172903Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3194666Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T10:32:27.3210498Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3210679Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3228221Z Entering 'third_party/fbgemm/external/json'
2025-12-04T10:32:27.3241313Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3241498Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3259905Z Entering 'third_party/flash-attention'
2025-12-04T10:32:27.3278245Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3278413Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3296074Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T10:32:27.3310674Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3311161Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3326990Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T10:32:27.3338237Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3338508Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3359887Z Entering 'third_party/flatbuffers'
2025-12-04T10:32:27.3372570Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3372792Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3391689Z Entering 'third_party/fmt'
2025-12-04T10:32:27.3404554Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3404762Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3427179Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T10:32:27.3440394Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3440588Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3456657Z Entering 'third_party/gloo'
2025-12-04T10:32:27.3470519Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3470706Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3494715Z Entering 'third_party/googletest'
2025-12-04T10:32:27.3507729Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3507916Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3527267Z Entering 'third_party/ideep'
2025-12-04T10:32:27.3540356Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3540526Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3558335Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T10:32:27.3573507Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3573675Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3595667Z Entering 'third_party/ittapi'
2025-12-04T10:32:27.3609296Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3609559Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3625629Z Entering 'third_party/kineto'
2025-12-04T10:32:27.3639048Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3639247Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3655907Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T10:32:27.3667858Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3668157Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3685259Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T10:32:27.3697561Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3697729Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3715094Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T10:32:27.3731882Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3732206Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3749048Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T10:32:27.3762934Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3763230Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3780225Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T10:32:27.3791936Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3792110Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3807869Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T10:32:27.3820185Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3820346Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3836745Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T10:32:27.3848619Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3848771Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3865230Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T10:32:27.3878593Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3878746Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3903751Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T10:32:27.3915247Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3915402Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3932180Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T10:32:27.3944535Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3944688Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3965978Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T10:32:27.3979931Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3980077Z url.https://github.com/.insteadof
2025-12-04T10:32:27.3998395Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T10:32:27.4013908Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4014247Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4033468Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T10:32:27.4052991Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4053120Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4074385Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T10:32:27.4086183Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4086325Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4104562Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T10:32:27.4117193Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4117322Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4136225Z Entering 'third_party/kleidiai'
2025-12-04T10:32:27.4148439Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4148569Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4164904Z Entering 'third_party/mimalloc'
2025-12-04T10:32:27.4177602Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4177885Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4195450Z Entering 'third_party/nlohmann'
2025-12-04T10:32:27.4206987Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4207201Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4237739Z Entering 'third_party/onnx'
2025-12-04T10:32:27.4239555Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4263554Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4263849Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T10:32:27.4277903Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4278142Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4301301Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T10:32:27.4315892Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4316041Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4336510Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T10:32:27.4348508Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4348665Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4364644Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T10:32:27.4378419Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4378566Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4394235Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T10:32:27.4407014Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4407167Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4423228Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T10:32:27.4436916Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4437061Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4454636Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T10:32:27.4467764Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4467914Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4482134Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T10:32:27.4492540Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4492669Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4511290Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T10:32:27.4523763Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4523890Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4538684Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T10:32:27.4551130Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4551253Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4571959Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T10:32:27.4584491Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4584618Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4604530Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T10:32:27.4616725Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4617042Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4643085Z Entering 'third_party/pocketfft'
2025-12-04T10:32:27.4656726Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4656848Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4676966Z Entering 'third_party/protobuf'
2025-12-04T10:32:27.4689148Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4689269Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4707318Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T10:32:27.4722850Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4722974Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4737169Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T10:32:27.4749904Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4750028Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4769308Z Entering 'third_party/psimd'
2025-12-04T10:32:27.4781794Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4781908Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4798824Z Entering 'third_party/pthreadpool'
2025-12-04T10:32:27.4810071Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4810189Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4825901Z Entering 'third_party/pybind11'
2025-12-04T10:32:27.4837742Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4837863Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4853695Z Entering 'third_party/python-peachpy'
2025-12-04T10:32:27.4866620Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4866745Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4883029Z Entering 'third_party/sleef'
2025-12-04T10:32:27.4895720Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4895846Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4915374Z Entering 'third_party/tensorpipe'
2025-12-04T10:32:27.4931972Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4932097Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4948606Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T10:32:27.4961231Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4961359Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4980687Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T10:32:27.4992814Z url.https://github.com/.insteadof
2025-12-04T10:32:27.4992940Z url.https://github.com/.insteadof
2025-12-04T10:32:27.5014275Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T10:32:27.5030540Z url.https://github.com/.insteadof
2025-12-04T10:32:27.5030672Z url.https://github.com/.insteadof
2025-12-04T10:32:27.5046551Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T10:32:27.5059357Z url.https://github.com/.insteadof
2025-12-04T10:32:27.5059483Z url.https://github.com/.insteadof
2025-12-04T10:32:27.5074575Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T10:32:27.5089116Z url.https://github.com/.insteadof
2025-12-04T10:32:27.5089239Z url.https://github.com/.insteadof
2025-12-04T10:32:27.5127674Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local 'http.https://github.com/.extraheader' 'AUTHORIZATION: basic ***' && git config --local --show-origin --name-only --get-regexp remote.origin.url"
2025-12-04T10:32:27.5294564Z Entering 'android/libs/fbjni'
2025-12-04T10:32:27.5313948Z file:/home/runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config	remote.origin.url
2025-12-04T10:32:27.5323948Z Entering 'third_party/FP16'
2025-12-04T10:32:27.5348101Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config	remote.origin.url
2025-12-04T10:32:27.5358974Z Entering 'third_party/FXdiv'
2025-12-04T10:32:27.5379110Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config	remote.origin.url
2025-12-04T10:32:27.5388864Z Entering 'third_party/NNPACK'
2025-12-04T10:32:27.5408228Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config	remote.origin.url
2025-12-04T10:32:27.5418212Z Entering 'third_party/NVTX'
2025-12-04T10:32:27.5436493Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config	remote.origin.url
2025-12-04T10:32:27.5450990Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T10:32:27.5470785Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config	remote.origin.url
2025-12-04T10:32:27.5480669Z Entering 'third_party/XNNPACK'
2025-12-04T10:32:27.5499371Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config	remote.origin.url
2025-12-04T10:32:27.5516179Z Entering 'third_party/aiter'
2025-12-04T10:32:27.5534349Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config	remote.origin.url
2025-12-04T10:32:27.5545823Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T10:32:27.5564962Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config	remote.origin.url
2025-12-04T10:32:27.5579320Z Entering 'third_party/benchmark'
2025-12-04T10:32:27.5601032Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config	remote.origin.url
2025-12-04T10:32:27.5610665Z Entering 'third_party/composable_kernel'
2025-12-04T10:32:27.5631430Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config	remote.origin.url
2025-12-04T10:32:27.5644085Z Entering 'third_party/cpp-httplib'
2025-12-04T10:32:27.5664160Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config	remote.origin.url
2025-12-04T10:32:27.5676799Z Entering 'third_party/cpuinfo'
2025-12-04T10:32:27.5701611Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config	remote.origin.url
2025-12-04T10:32:27.5713465Z Entering 'third_party/cudnn_frontend'
2025-12-04T10:32:27.5733222Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config	remote.origin.url
2025-12-04T10:32:27.5743201Z Entering 'third_party/cutlass'
2025-12-04T10:32:27.5764261Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config	remote.origin.url
2025-12-04T10:32:27.5777333Z Entering 'third_party/fbgemm'
2025-12-04T10:32:27.5796563Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config	remote.origin.url
2025-12-04T10:32:27.5807127Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T10:32:27.5843645Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config	remote.origin.url
2025-12-04T10:32:27.5855898Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T10:32:27.5877556Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config	remote.origin.url
2025-12-04T10:32:27.5890184Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T10:32:27.5909907Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config	remote.origin.url
2025-12-04T10:32:27.5919838Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T10:32:27.5939725Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config	remote.origin.url
2025-12-04T10:32:27.5951608Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T10:32:27.5973035Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config	remote.origin.url
2025-12-04T10:32:27.5982090Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T10:32:27.6001486Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config	remote.origin.url
2025-12-04T10:32:27.6014832Z Entering 'third_party/fbgemm/external/json'
2025-12-04T10:32:27.6037873Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config	remote.origin.url
2025-12-04T10:32:27.6049922Z Entering 'third_party/flash-attention'
2025-12-04T10:32:27.6069665Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config	remote.origin.url
2025-12-04T10:32:27.6079644Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T10:32:27.6098533Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config	remote.origin.url
2025-12-04T10:32:27.6109972Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T10:32:27.6129334Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config	remote.origin.url
2025-12-04T10:32:27.6144045Z Entering 'third_party/flatbuffers'
2025-12-04T10:32:27.6161768Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config	remote.origin.url
2025-12-04T10:32:27.6173351Z Entering 'third_party/fmt'
2025-12-04T10:32:27.6195369Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config	remote.origin.url
2025-12-04T10:32:27.6205216Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T10:32:27.6224344Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config	remote.origin.url
2025-12-04T10:32:27.6234530Z Entering 'third_party/gloo'
2025-12-04T10:32:27.6254240Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config	remote.origin.url
2025-12-04T10:32:27.6269987Z Entering 'third_party/googletest'
2025-12-04T10:32:27.6299504Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config	remote.origin.url
2025-12-04T10:32:27.6309923Z Entering 'third_party/ideep'
2025-12-04T10:32:27.6328675Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config	remote.origin.url
2025-12-04T10:32:27.6338044Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T10:32:27.6378839Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config	remote.origin.url
2025-12-04T10:32:27.6392390Z Entering 'third_party/ittapi'
2025-12-04T10:32:27.6416858Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config	remote.origin.url
2025-12-04T10:32:27.6426815Z Entering 'third_party/kineto'
2025-12-04T10:32:27.6443486Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config	remote.origin.url
2025-12-04T10:32:27.6454382Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T10:32:27.6476291Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config	remote.origin.url
2025-12-04T10:32:27.6486653Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T10:32:27.6511114Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config	remote.origin.url
2025-12-04T10:32:27.6521698Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T10:32:27.6539763Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config	remote.origin.url
2025-12-04T10:32:27.6548941Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T10:32:27.6568424Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config	remote.origin.url
2025-12-04T10:32:27.6577576Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T10:32:27.6602196Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config	remote.origin.url
2025-12-04T10:32:27.6615927Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T10:32:27.6641167Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config	remote.origin.url
2025-12-04T10:32:27.6653078Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T10:32:27.6675442Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config	remote.origin.url
2025-12-04T10:32:27.6684665Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T10:32:27.6702887Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config	remote.origin.url
2025-12-04T10:32:27.6714979Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T10:32:27.6736951Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config	remote.origin.url
2025-12-04T10:32:27.6746937Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T10:32:27.6765800Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config	remote.origin.url
2025-12-04T10:32:27.6775798Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T10:32:27.6796765Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config	remote.origin.url
2025-12-04T10:32:27.6806173Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T10:32:27.6830134Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config	remote.origin.url
2025-12-04T10:32:27.6844785Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T10:32:27.6863196Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config	remote.origin.url
2025-12-04T10:32:27.6878804Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T10:32:27.6898200Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config	remote.origin.url
2025-12-04T10:32:27.6907572Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T10:32:27.6926616Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config	remote.origin.url
2025-12-04T10:32:27.6938081Z Entering 'third_party/kleidiai'
2025-12-04T10:32:27.6976592Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config	remote.origin.url
2025-12-04T10:32:27.6988829Z Entering 'third_party/mimalloc'
2025-12-04T10:32:27.7008000Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config	remote.origin.url
2025-12-04T10:32:27.7017966Z Entering 'third_party/nlohmann'
2025-12-04T10:32:27.7040173Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config	remote.origin.url
2025-12-04T10:32:27.7048101Z Entering 'third_party/onnx'
2025-12-04T10:32:27.7067045Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config	remote.origin.url
2025-12-04T10:32:27.7082771Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T10:32:27.7104452Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config	remote.origin.url
2025-12-04T10:32:27.7116961Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T10:32:27.7137108Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config	remote.origin.url
2025-12-04T10:32:27.7146654Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T10:32:27.7167139Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config	remote.origin.url
2025-12-04T10:32:27.7175298Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T10:32:27.7194552Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config	remote.origin.url
2025-12-04T10:32:27.7203491Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T10:32:27.7226576Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config	remote.origin.url
2025-12-04T10:32:27.7239683Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T10:32:27.7281265Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config	remote.origin.url
2025-12-04T10:32:27.7290934Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T10:32:27.7310637Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config	remote.origin.url
2025-12-04T10:32:27.7319412Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T10:32:27.7339561Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config	remote.origin.url
2025-12-04T10:32:27.7349232Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T10:32:27.7367593Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config	remote.origin.url
2025-12-04T10:32:27.7374840Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T10:32:27.7392313Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config	remote.origin.url
2025-12-04T10:32:27.7402095Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T10:32:27.7427302Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config	remote.origin.url
2025-12-04T10:32:27.7438524Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T10:32:27.7459140Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config	remote.origin.url
2025-12-04T10:32:27.7475935Z Entering 'third_party/pocketfft'
2025-12-04T10:32:27.7498818Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config	remote.origin.url
2025-12-04T10:32:27.7514094Z Entering 'third_party/protobuf'
2025-12-04T10:32:27.7533880Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config	remote.origin.url
2025-12-04T10:32:27.7543700Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T10:32:27.7562442Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config	remote.origin.url
2025-12-04T10:32:27.7571925Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T10:32:27.7589191Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config	remote.origin.url
2025-12-04T10:32:27.7607128Z Entering 'third_party/psimd'
2025-12-04T10:32:27.7627081Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config	remote.origin.url
2025-12-04T10:32:27.7637220Z Entering 'third_party/pthreadpool'
2025-12-04T10:32:27.7662019Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config	remote.origin.url
2025-12-04T10:32:27.7672236Z Entering 'third_party/pybind11'
2025-12-04T10:32:27.7693102Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config	remote.origin.url
2025-12-04T10:32:27.7703822Z Entering 'third_party/python-peachpy'
2025-12-04T10:32:27.7726384Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config	remote.origin.url
2025-12-04T10:32:27.7736353Z Entering 'third_party/sleef'
2025-12-04T10:32:27.7757135Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config	remote.origin.url
2025-12-04T10:32:27.7767242Z Entering 'third_party/tensorpipe'
2025-12-04T10:32:27.7785322Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config	remote.origin.url
2025-12-04T10:32:27.7795817Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T10:32:27.7814763Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config	remote.origin.url
2025-12-04T10:32:27.7823790Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T10:32:27.7843087Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config	remote.origin.url
2025-12-04T10:32:27.7852974Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T10:32:27.7873041Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config	remote.origin.url
2025-12-04T10:32:27.7882958Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T10:32:27.7912127Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config	remote.origin.url
2025-12-04T10:32:27.7921857Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T10:32:27.7943002Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config	remote.origin.url
2025-12-04T10:32:27.8165297Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'git@github.com:'
2025-12-04T10:32:27.8325824Z Entering 'android/libs/fbjni'
2025-12-04T10:32:27.8349715Z Entering 'third_party/FP16'
2025-12-04T10:32:27.8369331Z Entering 'third_party/FXdiv'
2025-12-04T10:32:27.8390307Z Entering 'third_party/NNPACK'
2025-12-04T10:32:27.8409134Z Entering 'third_party/NVTX'
2025-12-04T10:32:27.8428188Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T10:32:27.8446222Z Entering 'third_party/XNNPACK'
2025-12-04T10:32:27.8471669Z Entering 'third_party/aiter'
2025-12-04T10:32:27.8492687Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T10:32:27.8534056Z Entering 'third_party/benchmark'
2025-12-04T10:32:27.8559454Z Entering 'third_party/composable_kernel'
2025-12-04T10:32:27.8585083Z Entering 'third_party/cpp-httplib'
2025-12-04T10:32:27.8605286Z Entering 'third_party/cpuinfo'
2025-12-04T10:32:27.8623720Z Entering 'third_party/cudnn_frontend'
2025-12-04T10:32:27.8641958Z Entering 'third_party/cutlass'
2025-12-04T10:32:27.8663955Z Entering 'third_party/fbgemm'
2025-12-04T10:32:27.8684257Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T10:32:27.8703357Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T10:32:27.8726030Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T10:32:27.8745849Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T10:32:27.8767879Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T10:32:27.8786294Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T10:32:27.8805426Z Entering 'third_party/fbgemm/external/json'
2025-12-04T10:32:27.8825224Z Entering 'third_party/flash-attention'
2025-12-04T10:32:27.8847893Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T10:32:27.8873890Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T10:32:27.8896891Z Entering 'third_party/flatbuffers'
2025-12-04T10:32:27.8917454Z Entering 'third_party/fmt'
2025-12-04T10:32:27.8938314Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T10:32:27.8962631Z Entering 'third_party/gloo'
2025-12-04T10:32:27.8981945Z Entering 'third_party/googletest'
2025-12-04T10:32:27.9001652Z Entering 'third_party/ideep'
2025-12-04T10:32:27.9021302Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T10:32:27.9045225Z Entering 'third_party/ittapi'
2025-12-04T10:32:27.9066326Z Entering 'third_party/kineto'
2025-12-04T10:32:27.9086136Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T10:32:27.9105069Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T10:32:27.9123482Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T10:32:27.9140547Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T10:32:27.9158535Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T10:32:27.9181532Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T10:32:27.9220315Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T10:32:27.9244083Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T10:32:27.9264503Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T10:32:27.9285399Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T10:32:27.9303303Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T10:32:27.9327053Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T10:32:27.9347236Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T10:32:27.9373025Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T10:32:27.9391676Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T10:32:27.9416391Z Entering 'third_party/kleidiai'
2025-12-04T10:32:27.9437851Z Entering 'third_party/mimalloc'
2025-12-04T10:32:27.9457454Z Entering 'third_party/nlohmann'
2025-12-04T10:32:27.9476660Z Entering 'third_party/onnx'
2025-12-04T10:32:27.9506924Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T10:32:27.9529893Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T10:32:27.9549373Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T10:32:27.9567017Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T10:32:27.9588118Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T10:32:27.9606223Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T10:32:27.9624561Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T10:32:27.9642988Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T10:32:27.9661893Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T10:32:27.9681097Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T10:32:27.9700857Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T10:32:27.9723230Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T10:32:27.9753931Z Entering 'third_party/pocketfft'
2025-12-04T10:32:27.9773505Z Entering 'third_party/protobuf'
2025-12-04T10:32:27.9793830Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T10:32:27.9818755Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T10:32:27.9843558Z Entering 'third_party/psimd'
2025-12-04T10:32:27.9863195Z Entering 'third_party/pthreadpool'
2025-12-04T10:32:27.9891114Z Entering 'third_party/pybind11'
2025-12-04T10:32:27.9915683Z Entering 'third_party/python-peachpy'
2025-12-04T10:32:27.9935777Z Entering 'third_party/sleef'
2025-12-04T10:32:27.9955575Z Entering 'third_party/tensorpipe'
2025-12-04T10:32:27.9984002Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T10:32:28.0001567Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T10:32:28.0020125Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T10:32:28.0040766Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T10:32:28.0059385Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T10:32:28.0108837Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'org-21003710@github.com:'
2025-12-04T10:32:28.0276201Z Entering 'android/libs/fbjni'
2025-12-04T10:32:28.0299768Z Entering 'third_party/FP16'
2025-12-04T10:32:28.0326598Z Entering 'third_party/FXdiv'
2025-12-04T10:32:28.0347982Z Entering 'third_party/NNPACK'
2025-12-04T10:32:28.0367884Z Entering 'third_party/NVTX'
2025-12-04T10:32:28.0388038Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T10:32:28.0411228Z Entering 'third_party/XNNPACK'
2025-12-04T10:32:28.0443836Z Entering 'third_party/aiter'
2025-12-04T10:32:28.0465251Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T10:32:28.0491419Z Entering 'third_party/benchmark'
2025-12-04T10:32:28.0511062Z Entering 'third_party/composable_kernel'
2025-12-04T10:32:28.0533356Z Entering 'third_party/cpp-httplib'
2025-12-04T10:32:28.0552768Z Entering 'third_party/cpuinfo'
2025-12-04T10:32:28.0573527Z Entering 'third_party/cudnn_frontend'
2025-12-04T10:32:28.0592991Z Entering 'third_party/cutlass'
2025-12-04T10:32:28.0618102Z Entering 'third_party/fbgemm'
2025-12-04T10:32:28.0640199Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T10:32:28.0659043Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T10:32:28.0681776Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T10:32:28.0700104Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T10:32:28.0722671Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T10:32:28.0741675Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T10:32:28.0759773Z Entering 'third_party/fbgemm/external/json'
2025-12-04T10:32:28.0779901Z Entering 'third_party/flash-attention'
2025-12-04T10:32:28.0799802Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T10:32:28.0819943Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T10:32:28.0842997Z Entering 'third_party/flatbuffers'
2025-12-04T10:32:28.0864196Z Entering 'third_party/fmt'
2025-12-04T10:32:28.0886615Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T10:32:28.0910198Z Entering 'third_party/gloo'
2025-12-04T10:32:28.0933649Z Entering 'third_party/googletest'
2025-12-04T10:32:28.0951662Z Entering 'third_party/ideep'
2025-12-04T10:32:28.0978311Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T10:32:28.1001706Z Entering 'third_party/ittapi'
2025-12-04T10:32:28.1020691Z Entering 'third_party/kineto'
2025-12-04T10:32:28.1043427Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T10:32:28.1062760Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T10:32:28.1081796Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T10:32:28.1099669Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T10:32:28.1120096Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T10:32:28.1138866Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T10:32:28.1160488Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T10:32:28.1178836Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T10:32:28.1196600Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T10:32:28.1215082Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T10:32:28.1232923Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T10:32:28.1251137Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T10:32:28.1272312Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T10:32:28.1298123Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T10:32:28.1322356Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T10:32:28.1346486Z Entering 'third_party/kleidiai'
2025-12-04T10:32:28.1370028Z Entering 'third_party/mimalloc'
2025-12-04T10:32:28.1389859Z Entering 'third_party/nlohmann'
2025-12-04T10:32:28.1410866Z Entering 'third_party/onnx'
2025-12-04T10:32:28.1439765Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T10:32:28.1466831Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T10:32:28.1487543Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T10:32:28.1505497Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T10:32:28.1522532Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T10:32:28.1539761Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T10:32:28.1558244Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T10:32:28.1575575Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T10:32:28.1594178Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T10:32:28.1613010Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T10:32:28.1648771Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T10:32:28.1672311Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T10:32:28.1698864Z Entering 'third_party/pocketfft'
2025-12-04T10:32:28.1717394Z Entering 'third_party/protobuf'
2025-12-04T10:32:28.1743676Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T10:32:28.1763683Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T10:32:28.1784767Z Entering 'third_party/psimd'
2025-12-04T10:32:28.1803707Z Entering 'third_party/pthreadpool'
2025-12-04T10:32:28.1822974Z Entering 'third_party/pybind11'
2025-12-04T10:32:28.1843055Z Entering 'third_party/python-peachpy'
2025-12-04T10:32:28.1864215Z Entering 'third_party/sleef'
2025-12-04T10:32:28.1884915Z Entering 'third_party/tensorpipe'
2025-12-04T10:32:28.1903899Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T10:32:28.1924832Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T10:32:28.1950492Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T10:32:28.1970924Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T10:32:28.1988469Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T10:32:28.2020076Z ##[endgroup]
2025-12-04T10:32:28.2450816Z [command]/usr/bin/git log -1 --format=%H
2025-12-04T10:32:28.2936957Z ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T10:32:28.3070838Z Prepare all required actions
2025-12-04T10:32:28.3071168Z Getting action download info
2025-12-04T10:32:28.5460759Z Download action repository 'aws-actions/amazon-ecr-login@062b18b96a7aff071d4dc91bc00c4c1a7945b076' (SHA:062b18b96a7aff071d4dc91bc00c4c1a7945b076)
2025-12-04T10:32:29.4359857Z ##[group]Run ./.github/actions/setup-rocm
2025-12-04T10:32:29.4359993Z env:
2025-12-04T10:32:29.4360077Z   GIT_DEFAULT_BRANCH: main
2025-12-04T10:32:29.4360188Z ##[endgroup]
2025-12-04T10:32:29.4371296Z ##[group]Run dpkg -l | grep -E "  rocm"
2025-12-04T10:32:29.4371430Z [36;1mdpkg -l | grep -E "  rocm"[0m
2025-12-04T10:32:29.4374660Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T10:32:29.4374798Z env:
2025-12-04T10:32:29.4374881Z   GIT_DEFAULT_BRANCH: main
2025-12-04T10:32:29.4374982Z ##[endgroup]
2025-12-04T10:32:29.4435158Z ii  rocm-cmake                   0.14.0.60401-83~22.04                   amd64        rocm-cmake built using CMake
2025-12-04T10:32:29.4435626Z ii  rocm-core                    6.4.1.60401-83~22.04                    amd64        ROCm Runtime software stack
2025-12-04T10:32:29.4436008Z ii  rocm-dbgapi                  0.77.2.60401-83~22.04                   amd64        Library to provide AMD GPU debugger API
2025-12-04T10:32:29.4436428Z ii  rocm-debug-agent             2.0.4.60401-83~22.04                    amd64        Radeon Open Compute Debug Agent (ROCdebug-agent)
2025-12-04T10:32:29.4437212Z ii  rocm-dev                     6.4.1.60401-83~22.04                    amd64        Radeon Open Compute (ROCm) Runtime software stack
2025-12-04T10:32:29.4437653Z ii  rocm-device-libs             1.0.0.60401-83~22.04                    amd64        Radeon Open Compute - device libraries
2025-12-04T10:32:29.4438013Z ii  rocm-gdb                     15.2.60401-83~22.04                     amd64        ROCgdb
2025-12-04T10:32:29.4438342Z ii  rocm-llvm                    19.0.0.25184.60401-83~22.04             amd64        ROCm core compiler
2025-12-04T10:32:29.4438690Z ii  rocm-opencl                  2.0.0.60401-83~22.04                    amd64        clr built using CMake
2025-12-04T10:32:29.4439033Z ii  rocm-opencl-dev              2.0.0.60401-83~22.04                    amd64        clr built using CMake
2025-12-04T10:32:29.4439391Z ii  rocm-smi-lib                 7.5.0.60401-83~22.04                    amd64        AMD System Management libraries
2025-12-04T10:32:29.4439858Z ii  rocm-utils                   6.4.1.60401-83~22.04                    amd64        Radeon Open Compute (ROCm) Runtime software stack
2025-12-04T10:32:29.4440256Z ii  rocminfo                     1.0.0.60401-83~22.04                    amd64        Radeon Open Compute (ROCm) Runtime rocminfo tool
2025-12-04T10:32:29.4458325Z ##[group]Run # ignore expansion of "docker ps -q" since it could be empty
2025-12-04T10:32:29.4458647Z [36;1m# ignore expansion of "docker ps -q" since it could be empty[0m
2025-12-04T10:32:29.4458837Z [36;1m# shellcheck disable=SC2046[0m
2025-12-04T10:32:29.4458995Z [36;1mdocker stop $(docker ps -q) || true[0m
2025-12-04T10:32:29.4459145Z [36;1m# Prune all stopped containers.[0m
2025-12-04T10:32:29.4459290Z [36;1mdocker container prune -f[0m
2025-12-04T10:32:29.4463731Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T10:32:29.4463875Z env:
2025-12-04T10:32:29.4463965Z   GIT_DEFAULT_BRANCH: main
2025-12-04T10:32:29.4464071Z ##[endgroup]
2025-12-04T10:32:29.4649803Z docker: 'docker stop' requires at least 1 argument
2025-12-04T10:32:29.4649924Z 
2025-12-04T10:32:29.4649997Z Usage:  docker stop [OPTIONS] CONTAINER [CONTAINER...]
2025-12-04T10:32:29.4650106Z 
2025-12-04T10:32:29.4650176Z See 'docker stop --help' for more information
2025-12-04T10:32:29.4733879Z Total reclaimed space: 0B
2025-12-04T10:32:29.4763783Z ##[group]Run cat /etc/os-release || true
2025-12-04T10:32:29.4764027Z [36;1mcat /etc/os-release || true[0m
2025-12-04T10:32:29.4764211Z [36;1mcat /etc/apt/sources.list.d/rocm.list || true[0m
2025-12-04T10:32:29.4764586Z [36;1mcat /opt/rocm/.info/version || true[0m
2025-12-04T10:32:29.4764749Z [36;1mwhoami[0m
2025-12-04T10:32:29.4769619Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T10:32:29.4769809Z env:
2025-12-04T10:32:29.4769931Z   GIT_DEFAULT_BRANCH: main
2025-12-04T10:32:29.4770059Z ##[endgroup]
2025-12-04T10:32:29.4789797Z PRETTY_NAME="Ubuntu 22.04.5 LTS"
2025-12-04T10:32:29.4789924Z NAME="Ubuntu"
2025-12-04T10:32:29.4790035Z VERSION_ID="22.04"
2025-12-04T10:32:29.4790137Z VERSION="22.04.5 LTS (Jammy Jellyfish)"
2025-12-04T10:32:29.4790258Z VERSION_CODENAME=jammy
2025-12-04T10:32:29.4790356Z ID=ubuntu
2025-12-04T10:32:29.4790439Z ID_LIKE=debian
2025-12-04T10:32:29.4790563Z HOME_URL="https://www.ubuntu.com/"
2025-12-04T10:32:29.4790692Z SUPPORT_URL="https://help.ubuntu.com/"
2025-12-04T10:32:29.4790843Z BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
2025-12-04T10:32:29.4791054Z PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
2025-12-04T10:32:29.4791242Z UBUNTU_CODENAME=jammy
2025-12-04T10:32:29.4796201Z deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/6.4.1 jammy main
2025-12-04T10:32:29.4802827Z 6.4.1-83
2025-12-04T10:32:29.4808707Z runner
2025-12-04T10:32:29.4828204Z ##[group]Run dpkg -l | grep -E "  amdgpu"
2025-12-04T10:32:29.4828447Z [36;1mdpkg -l | grep -E "  amdgpu"[0m
2025-12-04T10:32:29.4833259Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T10:32:29.4833579Z env:
2025-12-04T10:32:29.4833675Z   GIT_DEFAULT_BRANCH: main
2025-12-04T10:32:29.4833793Z ##[endgroup]
2025-12-04T10:32:29.4883205Z ii  amdgpu-core                  1:6.4.60401-2164967.22.04               all          Core meta package for unified amdgpu driver.
2025-12-04T10:32:29.4883475Z ii  amdgpu-install               6.4.60401-2164967.22.04                 all          AMDGPU driver repository and installer
2025-12-04T10:32:29.4897053Z ##[group]Run rocm-smi
2025-12-04T10:32:29.4897205Z [36;1mrocm-smi[0m
2025-12-04T10:32:29.4901376Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T10:32:29.4901596Z env:
2025-12-04T10:32:29.4901701Z   GIT_DEFAULT_BRANCH: main
2025-12-04T10:32:29.4901816Z ##[endgroup]
2025-12-04T10:32:29.5531556Z 
2025-12-04T10:32:29.5531716Z 
2025-12-04T10:32:29.5532088Z ============================================ ROCm System Management Interface ============================================
2025-12-04T10:32:29.5532678Z ====================================================== Concise Info ======================================================
2025-12-04T10:32:29.5533284Z Device  Node  IDs              Temp        Power     Partitions          SCLK  MCLK    Fan  Perf    PwrCap   VRAM%  GPU%  
2025-12-04T10:32:29.5534176Z [3m              (DID,     GUID)  (Junction)  (Socket)  (Mem, Compute, ID)                                                   [0m
2025-12-04T10:32:29.5534685Z ==========================================================================================================================
2025-12-04T10:32:29.5535598Z 0       3     0x74a5,   51110  30.0°C      114.0W    NPS1, SPX, 0        N/A   900Mhz  0%   manual  1000.0W  0%     0%    
2025-12-04T10:32:29.5536315Z 1       5     0x74a5,   2987   27.0°C      115.0W    NPS1, SPX, 0        N/A   900Mhz  0%   manual  1000.0W  0%     0%    
2025-12-04T10:32:29.5536999Z 2       4     0x74a5,   61326  27.0°C      123.0W    NPS1, SPX, 0        N/A   900Mhz  0%   manual  1000.0W  0%     0%    
2025-12-04T10:32:29.5537686Z 3       2     0x74a5,   9091   28.0°C      127.0W    NPS1, SPX, 0        N/A   900Mhz  0%   manual  1000.0W  0%     0%    
2025-12-04T10:32:29.5538176Z ==========================================================================================================================
2025-12-04T10:32:29.5538610Z ================================================== End of ROCm SMI Log ===================================================
2025-12-04T10:32:29.5593821Z ##[group]Run rocminfo
2025-12-04T10:32:29.5593950Z [36;1mrocminfo[0m
2025-12-04T10:32:29.5597647Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T10:32:29.5597793Z env:
2025-12-04T10:32:29.5597884Z   GIT_DEFAULT_BRANCH: main
2025-12-04T10:32:29.5597995Z ##[endgroup]
2025-12-04T10:32:29.6494753Z [37mROCk module version 6.12.12 is loaded[0m
2025-12-04T10:32:29.6494899Z =====================    
2025-12-04T10:32:29.6495066Z HSA System Attributes    
2025-12-04T10:32:29.6495164Z =====================    
2025-12-04T10:32:29.6495294Z Runtime Version:         1.15
2025-12-04T10:32:29.6495410Z Runtime Ext Version:     1.7
2025-12-04T10:32:29.6495527Z System Timestamp Freq.:  1000.000000MHz
2025-12-04T10:32:29.6495709Z Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
2025-12-04T10:32:29.6495952Z Machine Model:           LARGE                              
2025-12-04T10:32:29.6496213Z System Endianness:       LITTLE                             
2025-12-04T10:32:29.6496408Z Mwaitx:                  DISABLED
2025-12-04T10:32:29.6496524Z XNACK enabled:           NO
2025-12-04T10:32:29.6496663Z DMAbuf Support:          YES
2025-12-04T10:32:29.6496767Z VMM Support:             YES
2025-12-04T10:32:29.6496838Z 
2025-12-04T10:32:29.6496909Z ==========               
2025-12-04T10:32:29.6497012Z HSA Agents               
2025-12-04T10:32:29.6497103Z ==========               
2025-12-04T10:32:29.6497194Z *******                  
2025-12-04T10:32:29.6497281Z Agent 1                  
2025-12-04T10:32:29.6497553Z *******                  
2025-12-04T10:32:29.6497671Z   Name:                    AMD EPYC 9575F 64-Core Processor   
2025-12-04T10:32:29.6497889Z   Uuid:                    CPU-XX                             
2025-12-04T10:32:29.6498044Z   Marketing Name:          AMD EPYC 9575F 64-Core Processor   
2025-12-04T10:32:29.6498209Z   Vendor Name:             CPU                                
2025-12-04T10:32:29.6498352Z   Feature:                 None specified                     
2025-12-04T10:32:29.6498506Z   Profile:                 FULL_PROFILE                       
2025-12-04T10:32:29.6498690Z   Float Round Mode:        NEAR                               
2025-12-04T10:32:29.6498849Z   Max Queue Number:        0(0x0)                             
2025-12-04T10:32:29.6499012Z   Queue Min Size:          0(0x0)                             
2025-12-04T10:32:29.6499162Z   Queue Max Size:          0(0x0)                             
2025-12-04T10:32:29.6499341Z   Queue Type:              MULTI                              
2025-12-04T10:32:29.6499535Z   Node:                    0                                  
2025-12-04T10:32:29.6499717Z   Device Type:             CPU                                
2025-12-04T10:32:29.6499856Z   Cache Info:              
2025-12-04T10:32:29.6499968Z     L1:                      49152(0xc000) KB                   
2025-12-04T10:32:29.6500106Z   Chip ID:                 0(0x0)                             
2025-12-04T10:32:29.6500309Z   ASIC Revision:           0(0x0)                             
2025-12-04T10:32:29.6500462Z   Cacheline Size:          64(0x40)                           
2025-12-04T10:32:29.6500615Z   Max Clock Freq. (MHz):   3300                               
2025-12-04T10:32:29.6500753Z   BDFID:                   0                                  
2025-12-04T10:32:29.6500890Z   Internal Node ID:        0                                  
2025-12-04T10:32:29.6501034Z   Compute Unit:            64                                 
2025-12-04T10:32:29.6501184Z   SIMDs per CU:            0                                  
2025-12-04T10:32:29.6501325Z   Shader Engines:          0                                  
2025-12-04T10:32:29.6501476Z   Shader Arrs. per Eng.:   0                                  
2025-12-04T10:32:29.6501727Z   WatchPts on Addr. Ranges:1                                  
2025-12-04T10:32:29.6501875Z   Memory Properties:       
2025-12-04T10:32:29.6501986Z   Features:                None
2025-12-04T10:32:29.6502090Z   Pool Info:               
2025-12-04T10:32:29.6502275Z     Pool 1                   
2025-12-04T10:32:29.6502457Z       Segment:                 GLOBAL; FLAGS: FINE GRAINED        
2025-12-04T10:32:29.6502688Z       Size:                    1584776988(0x5e75c71c) KB          
2025-12-04T10:32:29.6502840Z       Allocatable:             TRUE                               
2025-12-04T10:32:29.6503122Z       Alloc Granule:           4KB                                
2025-12-04T10:32:29.6503375Z       Alloc Recommended Granule:4KB                                
2025-12-04T10:32:29.6503544Z       Alloc Alignment:         4KB                                
2025-12-04T10:32:29.6503734Z       Accessible by all:       TRUE                               
2025-12-04T10:32:29.6503871Z     Pool 2                   
2025-12-04T10:32:29.6504005Z       Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
2025-12-04T10:32:29.6504152Z       Size:                    1584776988(0x5e75c71c) KB          
2025-12-04T10:32:29.6504353Z       Allocatable:             TRUE                               
2025-12-04T10:32:29.6504521Z       Alloc Granule:           4KB                                
2025-12-04T10:32:29.6504722Z       Alloc Recommended Granule:4KB                                
2025-12-04T10:32:29.6504913Z       Alloc Alignment:         4KB                                
2025-12-04T10:32:29.6505068Z       Accessible by all:       TRUE                               
2025-12-04T10:32:29.6505264Z     Pool 3                   
2025-12-04T10:32:29.6505394Z       Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
2025-12-04T10:32:29.6505539Z       Size:                    1584776988(0x5e75c71c) KB          
2025-12-04T10:32:29.6505707Z       Allocatable:             TRUE                               
2025-12-04T10:32:29.6505870Z       Alloc Granule:           4KB                                
2025-12-04T10:32:29.6506026Z       Alloc Recommended Granule:4KB                                
2025-12-04T10:32:29.6506193Z       Alloc Alignment:         4KB                                
2025-12-04T10:32:29.6506347Z       Accessible by all:       TRUE                               
2025-12-04T10:32:29.6506497Z     Pool 4                   
2025-12-04T10:32:29.6506679Z       Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
2025-12-04T10:32:29.6506821Z       Size:                    1584776988(0x5e75c71c) KB          
2025-12-04T10:32:29.6506982Z       Allocatable:             TRUE                               
2025-12-04T10:32:29.6507186Z       Alloc Granule:           4KB                                
2025-12-04T10:32:29.6507358Z       Alloc Recommended Granule:4KB                                
2025-12-04T10:32:29.6507541Z       Alloc Alignment:         4KB                                
2025-12-04T10:32:29.6507691Z       Accessible by all:       TRUE                               
2025-12-04T10:32:29.6507827Z   ISA Info:                
2025-12-04T10:32:29.6507937Z *******                  
2025-12-04T10:32:29.6508066Z Agent 2                  
2025-12-04T10:32:29.6508167Z *******                  
2025-12-04T10:32:29.6508286Z   Name:                    AMD EPYC 9575F 64-Core Processor   
2025-12-04T10:32:29.6508426Z   Uuid:                    CPU-XX                             
2025-12-04T10:32:29.6508578Z   Marketing Name:          AMD EPYC 9575F 64-Core Processor   
2025-12-04T10:32:29.6508794Z   Vendor Name:             CPU                                
2025-12-04T10:32:29.6509004Z   Feature:                 None specified                     
2025-12-04T10:32:29.6509174Z   Profile:                 FULL_PROFILE                       
2025-12-04T10:32:29.6509370Z   Float Round Mode:        NEAR                               
2025-12-04T10:32:29.6509617Z   Max Queue Number:        0(0x0)                             
2025-12-04T10:32:29.6509838Z   Queue Min Size:          0(0x0)                             
2025-12-04T10:32:29.6509985Z   Queue Max Size:          0(0x0)                             
2025-12-04T10:32:29.6510230Z   Queue Type:              MULTI                              
2025-12-04T10:32:29.6510377Z   Node:                    1                                  
2025-12-04T10:32:29.6510592Z   Device Type:             CPU                                
2025-12-04T10:32:29.6510746Z   Cache Info:              
2025-12-04T10:32:29.6510862Z     L1:                      49152(0xc000) KB                   
2025-12-04T10:32:29.6511061Z   Chip ID:                 0(0x0)                             
2025-12-04T10:32:29.6511204Z   ASIC Revision:           0(0x0)                             
2025-12-04T10:32:29.6511360Z   Cacheline Size:          64(0x40)                           
2025-12-04T10:32:29.6511558Z   Max Clock Freq. (MHz):   3300                               
2025-12-04T10:32:29.6511701Z   BDFID:                   0                                  
2025-12-04T10:32:29.6511845Z   Internal Node ID:        1                                  
2025-12-04T10:32:29.6512024Z   Compute Unit:            64                                 
2025-12-04T10:32:29.6512211Z   SIMDs per CU:            0                                  
2025-12-04T10:32:29.6512408Z   Shader Engines:          0                                  
2025-12-04T10:32:29.6512558Z   Shader Arrs. per Eng.:   0                                  
2025-12-04T10:32:29.6512791Z   WatchPts on Addr. Ranges:1                                  
2025-12-04T10:32:29.6512927Z   Memory Properties:       
2025-12-04T10:32:29.6513147Z   Features:                None
2025-12-04T10:32:29.6513263Z   Pool Info:               
2025-12-04T10:32:29.6513362Z     Pool 1                   
2025-12-04T10:32:29.6513532Z       Segment:                 GLOBAL; FLAGS: FINE GRAINED        
2025-12-04T10:32:29.6513707Z       Size:                    1585311804(0x5e7df03c) KB          
2025-12-04T10:32:29.6513881Z       Allocatable:             TRUE                               
2025-12-04T10:32:29.6514035Z       Alloc Granule:           4KB                                
2025-12-04T10:32:29.6514206Z       Alloc Recommended Granule:4KB                                
2025-12-04T10:32:29.6514475Z       Alloc Alignment:         4KB                                
2025-12-04T10:32:29.6514634Z       Accessible by all:       TRUE                               
2025-12-04T10:32:29.6514770Z     Pool 2                   
2025-12-04T10:32:29.6514900Z       Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
2025-12-04T10:32:29.6515084Z       Size:                    1585311804(0x5e7df03c) KB          
2025-12-04T10:32:29.6515232Z       Allocatable:             TRUE                               
2025-12-04T10:32:29.6515448Z       Alloc Granule:           4KB                                
2025-12-04T10:32:29.6515621Z       Alloc Recommended Granule:4KB                                
2025-12-04T10:32:29.6515803Z       Alloc Alignment:         4KB                                
2025-12-04T10:32:29.6516012Z       Accessible by all:       TRUE                               
2025-12-04T10:32:29.6516226Z     Pool 3                   
2025-12-04T10:32:29.6516360Z       Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
2025-12-04T10:32:29.6516521Z       Size:                    1585311804(0x5e7df03c) KB          
2025-12-04T10:32:29.6516674Z       Allocatable:             TRUE                               
2025-12-04T10:32:29.6516903Z       Alloc Granule:           4KB                                
2025-12-04T10:32:29.6517080Z       Alloc Recommended Granule:4KB                                
2025-12-04T10:32:29.6517295Z       Alloc Alignment:         4KB                                
2025-12-04T10:32:29.6517450Z       Accessible by all:       TRUE                               
2025-12-04T10:32:29.6517582Z     Pool 4                   
2025-12-04T10:32:29.6517741Z       Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
2025-12-04T10:32:29.6517886Z       Size:                    1585311804(0x5e7df03c) KB          
2025-12-04T10:32:29.6518069Z       Allocatable:             TRUE                               
2025-12-04T10:32:29.6518299Z       Alloc Granule:           4KB                                
2025-12-04T10:32:29.6518477Z       Alloc Recommended Granule:4KB                                
2025-12-04T10:32:29.6518693Z       Alloc Alignment:         4KB                                
2025-12-04T10:32:29.6518864Z       Accessible by all:       TRUE                               
2025-12-04T10:32:29.6518998Z   ISA Info:                
2025-12-04T10:32:29.6519171Z *******                  
2025-12-04T10:32:29.6519344Z Agent 3                  
2025-12-04T10:32:29.6519439Z *******                  
2025-12-04T10:32:29.6519551Z   Name:                    gfx942                             
2025-12-04T10:32:29.6519745Z   Uuid:                    GPU-41f9686c3d70a95c               
2025-12-04T10:32:29.6519894Z   Marketing Name:          AMD Instinct MI325X                
2025-12-04T10:32:29.6520072Z   Vendor Name:             AMD                                
2025-12-04T10:32:29.6520224Z   Feature:                 KERNEL_DISPATCH                    
2025-12-04T10:32:29.6520415Z   Profile:                 BASE_PROFILE                       
2025-12-04T10:32:29.6520589Z   Float Round Mode:        NEAR                               
2025-12-04T10:32:29.6520751Z   Max Queue Number:        128(0x80)                          
2025-12-04T10:32:29.6520904Z   Queue Min Size:          64(0x40)                           
2025-12-04T10:32:29.6521086Z   Queue Max Size:          131072(0x20000)                    
2025-12-04T10:32:29.6521228Z   Queue Type:              MULTI                              
2025-12-04T10:32:29.6521370Z   Node:                    2                                  
2025-12-04T10:32:29.6521506Z   Device Type:             GPU                                
2025-12-04T10:32:29.6521640Z   Cache Info:              
2025-12-04T10:32:29.6521755Z     L1:                      32(0x20) KB                        
2025-12-04T10:32:29.6521890Z     L2:                      4096(0x1000) KB                    
2025-12-04T10:32:29.6522024Z     L3:                      262144(0x40000) KB                 
2025-12-04T10:32:29.6522191Z   Chip ID:                 29861(0x74a5)                      
2025-12-04T10:32:29.6522386Z   ASIC Revision:           1(0x1)                             
2025-12-04T10:32:29.6522564Z   Cacheline Size:          128(0x80)                          
2025-12-04T10:32:29.6522712Z   Max Clock Freq. (MHz):   2100                               
2025-12-04T10:32:29.6522913Z   BDFID:                   29952                              
2025-12-04T10:32:29.6523057Z   Internal Node ID:        2                                  
2025-12-04T10:32:29.6523254Z   Compute Unit:            304                                
2025-12-04T10:32:29.6523468Z   SIMDs per CU:            4                                  
2025-12-04T10:32:29.6523655Z   Shader Engines:          32                                 
2025-12-04T10:32:29.6523803Z   Shader Arrs. per Eng.:   1                                  
2025-12-04T10:32:29.6524019Z   WatchPts on Addr. Ranges:4                                  
2025-12-04T10:32:29.6524189Z   Coherent Host Access:    FALSE                              
2025-12-04T10:32:29.6524367Z   Memory Properties:       
2025-12-04T10:32:29.6524514Z   Features:                KERNEL_DISPATCH 
2025-12-04T10:32:29.6524654Z   Fast F16 Operation:      TRUE                               
2025-12-04T10:32:29.6524905Z   Wavefront Size:          64(0x40)                           
2025-12-04T10:32:29.6525088Z   Workgroup Max Size:      1024(0x400)                        
2025-12-04T10:32:29.6525228Z   Workgroup Max Size per Dimension:
2025-12-04T10:32:29.6525387Z     x                        1024(0x400)                        
2025-12-04T10:32:29.6525553Z     y                        1024(0x400)                        
2025-12-04T10:32:29.6525714Z     z                        1024(0x400)                        
2025-12-04T10:32:29.6525853Z   Max Waves Per CU:        32(0x20)                           
2025-12-04T10:32:29.6526070Z   Max Work-item Per CU:    2048(0x800)                        
2025-12-04T10:32:29.6526232Z   Grid Max Size:           4294967295(0xffffffff)             
2025-12-04T10:32:29.6526409Z   Grid Max Size per Dimension:
2025-12-04T10:32:29.6526523Z     x                        4294967295(0xffffffff)             
2025-12-04T10:32:29.6526661Z     y                        4294967295(0xffffffff)             
2025-12-04T10:32:29.6526865Z     z                        4294967295(0xffffffff)             
2025-12-04T10:32:29.6527033Z   Max fbarriers/Workgrp:   32                                 
2025-12-04T10:32:29.6532809Z   Packet Processor uCode:: 185                                
2025-12-04T10:32:29.6532974Z   SDMA engine uCode::      24                                 
2025-12-04T10:32:29.6533139Z   IOMMU Support::          None                               
2025-12-04T10:32:29.6533338Z   Pool Info:               
2025-12-04T10:32:29.6533478Z     Pool 1                   
2025-12-04T10:32:29.6533678Z       Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
2025-12-04T10:32:29.6533844Z       Size:                    268419072(0xfffc000) KB            
2025-12-04T10:32:29.6533992Z       Allocatable:             TRUE                               
2025-12-04T10:32:29.6534242Z       Alloc Granule:           4KB                                
2025-12-04T10:32:29.6534404Z       Alloc Recommended Granule:2048KB                             
2025-12-04T10:32:29.6534646Z       Alloc Alignment:         4KB                                
2025-12-04T10:32:29.6534837Z       Accessible by all:       FALSE                              
2025-12-04T10:32:29.6535003Z     Pool 2                   
2025-12-04T10:32:29.6535135Z       Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
2025-12-04T10:32:29.6535287Z       Size:                    268419072(0xfffc000) KB            
2025-12-04T10:32:29.6535432Z       Allocatable:             TRUE                               
2025-12-04T10:32:29.6535588Z       Alloc Granule:           4KB                                
2025-12-04T10:32:29.6535899Z       Alloc Recommended Granule:2048KB                             
2025-12-04T10:32:29.6536056Z       Alloc Alignment:         4KB                                
2025-12-04T10:32:29.6536365Z       Accessible by all:       FALSE                              
2025-12-04T10:32:29.6536499Z     Pool 3                   
2025-12-04T10:32:29.6536662Z       Segment:                 GLOBAL; FLAGS: FINE GRAINED        
2025-12-04T10:32:29.6536809Z       Size:                    268419072(0xfffc000) KB            
2025-12-04T10:32:29.6537029Z       Allocatable:             TRUE                               
2025-12-04T10:32:29.6537270Z       Alloc Granule:           4KB                                
2025-12-04T10:32:29.6537477Z       Alloc Recommended Granule:2048KB                             
2025-12-04T10:32:29.6537700Z       Alloc Alignment:         4KB                                
2025-12-04T10:32:29.6537908Z       Accessible by all:       FALSE                              
2025-12-04T10:32:29.6538041Z     Pool 4                   
2025-12-04T10:32:29.6538199Z       Segment:                 GROUP                              
2025-12-04T10:32:29.6538366Z       Size:                    64(0x40) KB                        
2025-12-04T10:32:29.6538504Z       Allocatable:             FALSE                              
2025-12-04T10:32:29.6538724Z       Alloc Granule:           0KB                                
2025-12-04T10:32:29.6538882Z       Alloc Recommended Granule:0KB                                
2025-12-04T10:32:29.6539078Z       Alloc Alignment:         0KB                                
2025-12-04T10:32:29.6539264Z       Accessible by all:       FALSE                              
2025-12-04T10:32:29.6539480Z   ISA Info:                
2025-12-04T10:32:29.6539625Z     ISA 1                    
2025-12-04T10:32:29.6539799Z       Name:                    amdgcn-amd-amdhsa--gfx942:sramecc+:xnack-
2025-12-04T10:32:29.6539960Z       Machine Models:          HSA_MACHINE_MODEL_LARGE            
2025-12-04T10:32:29.6540120Z       Profiles:                HSA_PROFILE_BASE                   
2025-12-04T10:32:29.6540280Z       Default Rounding Mode:   NEAR                               
2025-12-04T10:32:29.6540503Z       Default Rounding Mode:   NEAR                               
2025-12-04T10:32:29.6540656Z       Fast f16:                TRUE                               
2025-12-04T10:32:29.6540812Z       Workgroup Max Size:      1024(0x400)                        
2025-12-04T10:32:29.6540953Z       Workgroup Max Size per Dimension:
2025-12-04T10:32:29.6541085Z         x                        1024(0x400)                        
2025-12-04T10:32:29.6541231Z         y                        1024(0x400)                        
2025-12-04T10:32:29.6541399Z         z                        1024(0x400)                        
2025-12-04T10:32:29.6541605Z       Grid Max Size:           4294967295(0xffffffff)             
2025-12-04T10:32:29.6541779Z       Grid Max Size per Dimension:
2025-12-04T10:32:29.6541902Z         x                        4294967295(0xffffffff)             
2025-12-04T10:32:29.6542034Z         y                        4294967295(0xffffffff)             
2025-12-04T10:32:29.6542224Z         z                        4294967295(0xffffffff)             
2025-12-04T10:32:29.6542369Z       FBarrier Max Size:       32                                 
2025-12-04T10:32:29.6542537Z     ISA 2                    
2025-12-04T10:32:29.6542759Z       Name:                    amdgcn-amd-amdhsa--gfx9-4-generic:sramecc+:xnack-
2025-12-04T10:32:29.6542990Z       Machine Models:          HSA_MACHINE_MODEL_LARGE            
2025-12-04T10:32:29.6543211Z       Profiles:                HSA_PROFILE_BASE                   
2025-12-04T10:32:29.6543372Z       Default Rounding Mode:   NEAR                               
2025-12-04T10:32:29.6543555Z       Default Rounding Mode:   NEAR                               
2025-12-04T10:32:29.6543704Z       Fast f16:                TRUE                               
2025-12-04T10:32:29.6543875Z       Workgroup Max Size:      1024(0x400)                        
2025-12-04T10:32:29.6544077Z       Workgroup Max Size per Dimension:
2025-12-04T10:32:29.6544223Z         x                        1024(0x400)                        
2025-12-04T10:32:29.6544352Z         y                        1024(0x400)                        
2025-12-04T10:32:29.6544497Z         z                        1024(0x400)                        
2025-12-04T10:32:29.6544674Z       Grid Max Size:           4294967295(0xffffffff)             
2025-12-04T10:32:29.6544812Z       Grid Max Size per Dimension:
2025-12-04T10:32:29.6544962Z         x                        4294967295(0xffffffff)             
2025-12-04T10:32:29.6545095Z         y                        4294967295(0xffffffff)             
2025-12-04T10:32:29.6545226Z         z                        4294967295(0xffffffff)             
2025-12-04T10:32:29.6545454Z       FBarrier Max Size:       32                                 
2025-12-04T10:32:29.6545612Z *******                  
2025-12-04T10:32:29.6545708Z Agent 4                  
2025-12-04T10:32:29.6545836Z *******                  
2025-12-04T10:32:29.6545994Z   Name:                    gfx942                             
2025-12-04T10:32:29.6546131Z   Uuid:                    GPU-e2954cd4b2ef3669               
2025-12-04T10:32:29.6546291Z   Marketing Name:          AMD Instinct MI325X                
2025-12-04T10:32:29.6546447Z   Vendor Name:             AMD                                
2025-12-04T10:32:29.6546601Z   Feature:                 KERNEL_DISPATCH                    
2025-12-04T10:32:29.6546826Z   Profile:                 BASE_PROFILE                       
2025-12-04T10:32:29.6547004Z   Float Round Mode:        NEAR                               
2025-12-04T10:32:29.6547156Z   Max Queue Number:        128(0x80)                          
2025-12-04T10:32:29.6547411Z   Queue Min Size:          64(0x40)                           
2025-12-04T10:32:29.6547572Z   Queue Max Size:          131072(0x20000)                    
2025-12-04T10:32:29.6547770Z   Queue Type:              MULTI                              
2025-12-04T10:32:29.6547909Z   Node:                    3                                  
2025-12-04T10:32:29.6561746Z   Device Type:             GPU                                
2025-12-04T10:32:29.6561911Z   Cache Info:              
2025-12-04T10:32:29.6562039Z     L1:                      32(0x20) KB                        
2025-12-04T10:32:29.6562183Z     L2:                      4096(0x1000) KB                    
2025-12-04T10:32:29.6562321Z     L3:                      262144(0x40000) KB                 
2025-12-04T10:32:29.6562460Z   Chip ID:                 29861(0x74a5)                      
2025-12-04T10:32:29.6562608Z   ASIC Revision:           1(0x1)                             
2025-12-04T10:32:29.6562759Z   Cacheline Size:          128(0x80)                          
2025-12-04T10:32:29.6562912Z   Max Clock Freq. (MHz):   2100                               
2025-12-04T10:32:29.6563055Z   BDFID:                   1280                               
2025-12-04T10:32:29.6563197Z   Internal Node ID:        3                                  
2025-12-04T10:32:29.6563356Z   Compute Unit:            304                                
2025-12-04T10:32:29.6563507Z   SIMDs per CU:            4                                  
2025-12-04T10:32:29.6563721Z   Shader Engines:          32                                 
2025-12-04T10:32:29.6563885Z   Shader Arrs. per Eng.:   1                                  
2025-12-04T10:32:29.6564053Z   WatchPts on Addr. Ranges:4                                  
2025-12-04T10:32:29.6564213Z   Coherent Host Access:    FALSE                              
2025-12-04T10:32:29.6564359Z   Memory Properties:       
2025-12-04T10:32:29.6564477Z   Features:                KERNEL_DISPATCH 
2025-12-04T10:32:29.6564622Z   Fast F16 Operation:      TRUE                               
2025-12-04T10:32:29.6564778Z   Wavefront Size:          64(0x40)                           
2025-12-04T10:32:29.6564928Z   Workgroup Max Size:      1024(0x400)                        
2025-12-04T10:32:29.6565077Z   Workgroup Max Size per Dimension:
2025-12-04T10:32:29.6565213Z     x                        1024(0x400)                        
2025-12-04T10:32:29.6565338Z     y                        1024(0x400)                        
2025-12-04T10:32:29.6565469Z     z                        1024(0x400)                        
2025-12-04T10:32:29.6565605Z   Max Waves Per CU:        32(0x20)                           
2025-12-04T10:32:29.6565761Z   Max Work-item Per CU:    2048(0x800)                        
2025-12-04T10:32:29.6565915Z   Grid Max Size:           4294967295(0xffffffff)             
2025-12-04T10:32:29.6566048Z   Grid Max Size per Dimension:
2025-12-04T10:32:29.6566168Z     x                        4294967295(0xffffffff)             
2025-12-04T10:32:29.6566306Z     y                        4294967295(0xffffffff)             
2025-12-04T10:32:29.6566429Z     z                        4294967295(0xffffffff)             
2025-12-04T10:32:29.6566577Z   Max fbarriers/Workgrp:   32                                 
2025-12-04T10:32:29.6566743Z   Packet Processor uCode:: 185                                
2025-12-04T10:32:29.6566914Z   SDMA engine uCode::      24                                 
2025-12-04T10:32:29.6567071Z   IOMMU Support::          None                               
2025-12-04T10:32:29.6567202Z   Pool Info:               
2025-12-04T10:32:29.6567311Z     Pool 1                   
2025-12-04T10:32:29.6567443Z       Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
2025-12-04T10:32:29.6567594Z       Size:                    268419072(0xfffc000) KB            
2025-12-04T10:32:29.6567744Z       Allocatable:             TRUE                               
2025-12-04T10:32:29.6567901Z       Alloc Granule:           4KB                                
2025-12-04T10:32:29.6568110Z       Alloc Recommended Granule:2048KB                             
2025-12-04T10:32:29.6568276Z       Alloc Alignment:         4KB                                
2025-12-04T10:32:29.6568431Z       Accessible by all:       FALSE                              
2025-12-04T10:32:29.6568571Z     Pool 2                   
2025-12-04T10:32:29.6568705Z       Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
2025-12-04T10:32:29.6568854Z       Size:                    268419072(0xfffc000) KB            
2025-12-04T10:32:29.6569001Z       Allocatable:             TRUE                               
2025-12-04T10:32:29.6569154Z       Alloc Granule:           4KB                                
2025-12-04T10:32:29.6569311Z       Alloc Recommended Granule:2048KB                             
2025-12-04T10:32:29.6569469Z       Alloc Alignment:         4KB                                
2025-12-04T10:32:29.6569659Z       Accessible by all:       FALSE                              
2025-12-04T10:32:29.6569803Z     Pool 3                   
2025-12-04T10:32:29.6569931Z       Segment:                 GLOBAL; FLAGS: FINE GRAINED        
2025-12-04T10:32:29.6570072Z       Size:                    268419072(0xfffc000) KB            
2025-12-04T10:32:29.6570212Z       Allocatable:             TRUE                               
2025-12-04T10:32:29.6570368Z       Alloc Granule:           4KB                                
2025-12-04T10:32:29.6570556Z       Alloc Recommended Granule:2048KB                             
2025-12-04T10:32:29.6570718Z       Alloc Alignment:         4KB                                
2025-12-04T10:32:29.6570876Z       Accessible by all:       FALSE                              
2025-12-04T10:32:29.6571004Z     Pool 4                   
2025-12-04T10:32:29.6571125Z       Segment:                 GROUP                              
2025-12-04T10:32:29.6571262Z       Size:                    64(0x40) KB                        
2025-12-04T10:32:29.6571414Z       Allocatable:             FALSE                              
2025-12-04T10:32:29.6571568Z       Alloc Granule:           0KB                                
2025-12-04T10:32:29.6571721Z       Alloc Recommended Granule:0KB                                
2025-12-04T10:32:29.6571879Z       Alloc Alignment:         0KB                                
2025-12-04T10:32:29.6572035Z       Accessible by all:       FALSE                              
2025-12-04T10:32:29.6572170Z   ISA Info:                
2025-12-04T10:32:29.6572268Z     ISA 1                    
2025-12-04T10:32:29.6572393Z       Name:                    amdgcn-amd-amdhsa--gfx942:sramecc+:xnack-
2025-12-04T10:32:29.6572559Z       Machine Models:          HSA_MACHINE_MODEL_LARGE            
2025-12-04T10:32:29.6572722Z       Profiles:                HSA_PROFILE_BASE                   
2025-12-04T10:32:29.6572875Z       Default Rounding Mode:   NEAR                               
2025-12-04T10:32:29.6573043Z       Default Rounding Mode:   NEAR                               
2025-12-04T10:32:29.6573196Z       Fast f16:                TRUE                               
2025-12-04T10:32:29.6573343Z       Workgroup Max Size:      1024(0x400)                        
2025-12-04T10:32:29.6573484Z       Workgroup Max Size per Dimension:
2025-12-04T10:32:29.6573610Z         x                        1024(0x400)                        
2025-12-04T10:32:29.6573743Z         y                        1024(0x400)                        
2025-12-04T10:32:29.6573872Z         z                        1024(0x400)                        
2025-12-04T10:32:29.6574006Z       Grid Max Size:           4294967295(0xffffffff)             
2025-12-04T10:32:29.6574141Z       Grid Max Size per Dimension:
2025-12-04T10:32:29.6574258Z         x                        4294967295(0xffffffff)             
2025-12-04T10:32:29.6574382Z         y                        4294967295(0xffffffff)             
2025-12-04T10:32:29.6574553Z         z                        4294967295(0xffffffff)             
2025-12-04T10:32:29.6574695Z       FBarrier Max Size:       32                                 
2025-12-04T10:32:29.6574828Z     ISA 2                    
2025-12-04T10:32:29.6574974Z       Name:                    amdgcn-amd-amdhsa--gfx9-4-generic:sramecc+:xnack-
2025-12-04T10:32:29.6575144Z       Machine Models:          HSA_MACHINE_MODEL_LARGE            
2025-12-04T10:32:29.6575305Z       Profiles:                HSA_PROFILE_BASE                   
2025-12-04T10:32:29.6575469Z       Default Rounding Mode:   NEAR                               
2025-12-04T10:32:29.6575628Z       Default Rounding Mode:   NEAR                               
2025-12-04T10:32:29.6575782Z       Fast f16:                TRUE                               
2025-12-04T10:32:29.6575934Z       Workgroup Max Size:      1024(0x400)                        
2025-12-04T10:32:29.6576075Z       Workgroup Max Size per Dimension:
2025-12-04T10:32:29.6576204Z         x                        1024(0x400)                        
2025-12-04T10:32:29.6576335Z         y                        1024(0x400)                        
2025-12-04T10:32:29.6576465Z         z                        1024(0x400)                        
2025-12-04T10:32:29.6576605Z       Grid Max Size:           4294967295(0xffffffff)             
2025-12-04T10:32:29.6576741Z       Grid Max Size per Dimension:
2025-12-04T10:32:29.6576864Z         x                        4294967295(0xffffffff)             
2025-12-04T10:32:29.6577047Z         y                        4294967295(0xffffffff)             
2025-12-04T10:32:29.6577174Z         z                        4294967295(0xffffffff)             
2025-12-04T10:32:29.6577320Z       FBarrier Max Size:       32                                 
2025-12-04T10:32:29.6577458Z *******                  
2025-12-04T10:32:29.6577557Z Agent 5                  
2025-12-04T10:32:29.6577658Z *******                  
2025-12-04T10:32:29.6577770Z   Name:                    gfx942                             
2025-12-04T10:32:29.6577921Z   Uuid:                    GPU-d34a48edc983a6e7               
2025-12-04T10:32:29.6578077Z   Marketing Name:          AMD Instinct MI325X                
2025-12-04T10:32:29.6578231Z   Vendor Name:             AMD                                
2025-12-04T10:32:29.6578385Z   Feature:                 KERNEL_DISPATCH                    
2025-12-04T10:32:29.6578540Z   Profile:                 BASE_PROFILE                       
2025-12-04T10:32:29.6578699Z   Float Round Mode:        NEAR                               
2025-12-04T10:32:29.6578856Z   Max Queue Number:        128(0x80)                          
2025-12-04T10:32:29.6579004Z   Queue Min Size:          64(0x40)                           
2025-12-04T10:32:29.6579151Z   Queue Max Size:          131072(0x20000)                    
2025-12-04T10:32:29.6579301Z   Queue Type:              MULTI                              
2025-12-04T10:32:29.6579437Z   Node:                    4                                  
2025-12-04T10:32:29.6579629Z   Device Type:             GPU                                
2025-12-04T10:32:29.6579764Z   Cache Info:              
2025-12-04T10:32:29.6579876Z     L1:                      32(0x20) KB                        
2025-12-04T10:32:29.6580012Z     L2:                      4096(0x1000) KB                    
2025-12-04T10:32:29.6580139Z     L3:                      262144(0x40000) KB                 
2025-12-04T10:32:29.6580276Z   Chip ID:                 29861(0x74a5)                      
2025-12-04T10:32:29.6580429Z   ASIC Revision:           1(0x1)                             
2025-12-04T10:32:29.6580578Z   Cacheline Size:          128(0x80)                          
2025-12-04T10:32:29.6580734Z   Max Clock Freq. (MHz):   2100                               
2025-12-04T10:32:29.6580879Z   BDFID:                   25856                              
2025-12-04T10:32:29.6581021Z   Internal Node ID:        4                                  
2025-12-04T10:32:29.6581239Z   Compute Unit:            304                                
2025-12-04T10:32:29.6581385Z   SIMDs per CU:            4                                  
2025-12-04T10:32:29.6581537Z   Shader Engines:          32                                 
2025-12-04T10:32:29.6581693Z   Shader Arrs. per Eng.:   1                                  
2025-12-04T10:32:29.6581851Z   WatchPts on Addr. Ranges:4                                  
2025-12-04T10:32:29.6582013Z   Coherent Host Access:    FALSE                              
2025-12-04T10:32:29.6582156Z   Memory Properties:       
2025-12-04T10:32:29.6582270Z   Features:                KERNEL_DISPATCH 
2025-12-04T10:32:29.6582412Z   Fast F16 Operation:      TRUE                               
2025-12-04T10:32:29.6582566Z   Wavefront Size:          64(0x40)                           
2025-12-04T10:32:29.6582722Z   Workgroup Max Size:      1024(0x400)                        
2025-12-04T10:32:29.6582864Z   Workgroup Max Size per Dimension:
2025-12-04T10:32:29.6582990Z     x                        1024(0x400)                        
2025-12-04T10:32:29.6583119Z     y                        1024(0x400)                        
2025-12-04T10:32:29.6583246Z     z                        1024(0x400)                        
2025-12-04T10:32:29.6583383Z   Max Waves Per CU:        32(0x20)                           
2025-12-04T10:32:29.6583535Z   Max Work-item Per CU:    2048(0x800)                        
2025-12-04T10:32:29.6583691Z   Grid Max Size:           4294967295(0xffffffff)             
2025-12-04T10:32:29.6583866Z   Grid Max Size per Dimension:
2025-12-04T10:32:29.6583981Z     x                        4294967295(0xffffffff)             
2025-12-04T10:32:29.6584110Z     y                        4294967295(0xffffffff)             
2025-12-04T10:32:29.6584237Z     z                        4294967295(0xffffffff)             
2025-12-04T10:32:29.6584385Z   Max fbarriers/Workgrp:   32                                 
2025-12-04T10:32:29.6584544Z   Packet Processor uCode:: 185                                
2025-12-04T10:32:29.6584708Z   SDMA engine uCode::      24                                 
2025-12-04T10:32:29.6584863Z   IOMMU Support::          None                               
2025-12-04T10:32:29.6584993Z   Pool Info:               
2025-12-04T10:32:29.6585102Z     Pool 1                   
2025-12-04T10:32:29.6585231Z       Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
2025-12-04T10:32:29.6585384Z       Size:                    268419072(0xfffc000) KB            
2025-12-04T10:32:29.6585540Z       Allocatable:             TRUE                               
2025-12-04T10:32:29.6585688Z       Alloc Granule:           4KB                                
2025-12-04T10:32:29.6585852Z       Alloc Recommended Granule:2048KB                             
2025-12-04T10:32:29.6586015Z       Alloc Alignment:         4KB                                
2025-12-04T10:32:29.6586170Z       Accessible by all:       FALSE                              
2025-12-04T10:32:29.6586310Z     Pool 2                   
2025-12-04T10:32:29.6586436Z       Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
2025-12-04T10:32:29.6586585Z       Size:                    268419072(0xfffc000) KB            
2025-12-04T10:32:29.6586732Z       Allocatable:             TRUE                               
2025-12-04T10:32:29.6586881Z       Alloc Granule:           4KB                                
2025-12-04T10:32:29.6587045Z       Alloc Recommended Granule:2048KB                             
2025-12-04T10:32:29.6587214Z       Alloc Alignment:         4KB                                
2025-12-04T10:32:29.6587370Z       Accessible by all:       FALSE                              
2025-12-04T10:32:29.6587511Z     Pool 3                   
2025-12-04T10:32:29.6587641Z       Segment:                 GLOBAL; FLAGS: FINE GRAINED        
2025-12-04T10:32:29.6587787Z       Size:                    268419072(0xfffc000) KB            
2025-12-04T10:32:29.6587936Z       Allocatable:             TRUE                               
2025-12-04T10:32:29.6588120Z       Alloc Granule:           4KB                                
2025-12-04T10:32:29.6588284Z       Alloc Recommended Granule:2048KB                             
2025-12-04T10:32:29.6588448Z       Alloc Alignment:         4KB                                
2025-12-04T10:32:29.6588601Z       Accessible by all:       FALSE                              
2025-12-04T10:32:29.6588742Z     Pool 4                   
2025-12-04T10:32:29.6588873Z       Segment:                 GROUP                              
2025-12-04T10:32:29.6589012Z       Size:                    64(0x40) KB                        
2025-12-04T10:32:29.6589158Z       Allocatable:             FALSE                              
2025-12-04T10:32:29.6589308Z       Alloc Granule:           0KB                                
2025-12-04T10:32:29.6589471Z       Alloc Recommended Granule:0KB                                
2025-12-04T10:32:29.6589695Z       Alloc Alignment:         0KB                                
2025-12-04T10:32:29.6589854Z       Accessible by all:       FALSE                              
2025-12-04T10:32:29.6589994Z   ISA Info:                
2025-12-04T10:32:29.6590100Z     ISA 1                    
2025-12-04T10:32:29.6590227Z       Name:                    amdgcn-amd-amdhsa--gfx942:sramecc+:xnack-
2025-12-04T10:32:29.6590392Z       Machine Models:          HSA_MACHINE_MODEL_LARGE            
2025-12-04T10:32:29.6590548Z       Profiles:                HSA_PROFILE_BASE                   
2025-12-04T10:32:29.6590747Z       Default Rounding Mode:   NEAR                               
2025-12-04T10:32:29.6590912Z       Default Rounding Mode:   NEAR                               
2025-12-04T10:32:29.6591061Z       Fast f16:                TRUE                               
2025-12-04T10:32:29.6591213Z       Workgroup Max Size:      1024(0x400)                        
2025-12-04T10:32:29.6591362Z       Workgroup Max Size per Dimension:
2025-12-04T10:32:29.6591491Z         x                        1024(0x400)                        
2025-12-04T10:32:29.6591626Z         y                        1024(0x400)                        
2025-12-04T10:32:29.6591758Z         z                        1024(0x400)                        
2025-12-04T10:32:29.6591898Z       Grid Max Size:           4294967295(0xffffffff)             
2025-12-04T10:32:29.6592040Z       Grid Max Size per Dimension:
2025-12-04T10:32:29.6592160Z         x                        4294967295(0xffffffff)             
2025-12-04T10:32:29.6592303Z         y                        4294967295(0xffffffff)             
2025-12-04T10:32:29.6592438Z         z                        4294967295(0xffffffff)             
2025-12-04T10:32:29.6592581Z       FBarrier Max Size:       32                                 
2025-12-04T10:32:29.6592722Z     ISA 2                    
2025-12-04T10:32:29.6592865Z       Name:                    amdgcn-amd-amdhsa--gfx9-4-generic:sramecc+:xnack-
2025-12-04T10:32:29.6593035Z       Machine Models:          HSA_MACHINE_MODEL_LARGE            
2025-12-04T10:32:29.6593200Z       Profiles:                HSA_PROFILE_BASE                   
2025-12-04T10:32:29.6593357Z       Default Rounding Mode:   NEAR                               
2025-12-04T10:32:29.6593522Z       Default Rounding Mode:   NEAR                               
2025-12-04T10:32:29.6593677Z       Fast f16:                TRUE                               
2025-12-04T10:32:29.6593825Z       Workgroup Max Size:      1024(0x400)                        
2025-12-04T10:32:29.6593974Z       Workgroup Max Size per Dimension:
2025-12-04T10:32:29.6594107Z         x                        1024(0x400)                        
2025-12-04T10:32:29.6594234Z         y                        1024(0x400)                        
2025-12-04T10:32:29.6594365Z         z                        1024(0x400)                        
2025-12-04T10:32:29.6594508Z       Grid Max Size:           4294967295(0xffffffff)             
2025-12-04T10:32:29.6594642Z       Grid Max Size per Dimension:
2025-12-04T10:32:29.6594810Z         x                        4294967295(0xffffffff)             
2025-12-04T10:32:29.6594938Z         y                        4294967295(0xffffffff)             
2025-12-04T10:32:29.6595073Z         z                        4294967295(0xffffffff)             
2025-12-04T10:32:29.6595218Z       FBarrier Max Size:       32                                 
2025-12-04T10:32:29.6595351Z *******                  
2025-12-04T10:32:29.6595453Z Agent 6                  
2025-12-04T10:32:29.6595553Z *******                  
2025-12-04T10:32:29.6595665Z   Name:                    gfx942                             
2025-12-04T10:32:29.6595810Z   Uuid:                    GPU-f24a9834b47f1628               
2025-12-04T10:32:29.6595960Z   Marketing Name:          AMD Instinct MI325X                
2025-12-04T10:32:29.6596120Z   Vendor Name:             AMD                                
2025-12-04T10:32:29.6596274Z   Feature:                 KERNEL_DISPATCH                    
2025-12-04T10:32:29.6596426Z   Profile:                 BASE_PROFILE                       
2025-12-04T10:32:29.6596580Z   Float Round Mode:        NEAR                               
2025-12-04T10:32:29.6596734Z   Max Queue Number:        128(0x80)                          
2025-12-04T10:32:29.6596881Z   Queue Min Size:          64(0x40)                           
2025-12-04T10:32:29.6597029Z   Queue Max Size:          131072(0x20000)                    
2025-12-04T10:32:29.6597204Z   Queue Type:              MULTI                              
2025-12-04T10:32:29.6597346Z   Node:                    5                                  
2025-12-04T10:32:29.6597488Z   Device Type:             GPU                                
2025-12-04T10:32:29.6597617Z   Cache Info:              
2025-12-04T10:32:29.6597734Z     L1:                      32(0x20) KB                        
2025-12-04T10:32:29.6597868Z     L2:                      4096(0x1000) KB                    
2025-12-04T10:32:29.6597998Z     L3:                      262144(0x40000) KB                 
2025-12-04T10:32:29.6598136Z   Chip ID:                 29861(0x74a5)                      
2025-12-04T10:32:29.6598281Z   ASIC Revision:           1(0x1)                             
2025-12-04T10:32:29.6598437Z   Cacheline Size:          128(0x80)                          
2025-12-04T10:32:29.6598594Z   Max Clock Freq. (MHz):   2100                               
2025-12-04T10:32:29.6598734Z   BDFID:                   5376                               
2025-12-04T10:32:29.6598888Z   Internal Node ID:        5                                  
2025-12-04T10:32:29.6599041Z   Compute Unit:            304                                
2025-12-04T10:32:29.6599186Z   SIMDs per CU:            4                                  
2025-12-04T10:32:29.6599337Z   Shader Engines:          32                                 
2025-12-04T10:32:29.6599494Z   Shader Arrs. per Eng.:   1                                  
2025-12-04T10:32:29.6599703Z   WatchPts on Addr. Ranges:4                                  
2025-12-04T10:32:29.6599865Z   Coherent Host Access:    FALSE                              
2025-12-04T10:32:29.6600003Z   Memory Properties:       
2025-12-04T10:32:29.6600121Z   Features:                KERNEL_DISPATCH 
2025-12-04T10:32:29.6600266Z   Fast F16 Operation:      TRUE                               
2025-12-04T10:32:29.6600420Z   Wavefront Size:          64(0x40)                           
2025-12-04T10:32:29.6600577Z   Workgroup Max Size:      1024(0x400)                        
2025-12-04T10:32:29.6600710Z   Workgroup Max Size per Dimension:
2025-12-04T10:32:29.6600827Z     x                        1024(0x400)                        
2025-12-04T10:32:29.6600945Z     y                        1024(0x400)                        
2025-12-04T10:32:29.6601066Z     z                        1024(0x400)                        
2025-12-04T10:32:29.6601202Z   Max Waves Per CU:        32(0x20)                           
2025-12-04T10:32:29.6601388Z   Max Work-item Per CU:    2048(0x800)                        
2025-12-04T10:32:29.6601540Z   Grid Max Size:           4294967295(0xffffffff)             
2025-12-04T10:32:29.6601673Z   Grid Max Size per Dimension:
2025-12-04T10:32:29.6601783Z     x                        4294967295(0xffffffff)             
2025-12-04T10:32:29.6601911Z     y                        4294967295(0xffffffff)             
2025-12-04T10:32:29.6602034Z     z                        4294967295(0xffffffff)             
2025-12-04T10:32:29.6602182Z   Max fbarriers/Workgrp:   32                                 
2025-12-04T10:32:29.6602344Z   Packet Processor uCode:: 185                                
2025-12-04T10:32:29.6602500Z   SDMA engine uCode::      24                                 
2025-12-04T10:32:29.6602654Z   IOMMU Support::          None                               
2025-12-04T10:32:29.6602789Z   Pool Info:               
2025-12-04T10:32:29.6602888Z     Pool 1                   
2025-12-04T10:32:29.6603015Z       Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
2025-12-04T10:32:29.6603170Z       Size:                    268419072(0xfffc000) KB            
2025-12-04T10:32:29.6603317Z       Allocatable:             TRUE                               
2025-12-04T10:32:29.6603468Z       Alloc Granule:           4KB                                
2025-12-04T10:32:29.6603623Z       Alloc Recommended Granule:2048KB                             
2025-12-04T10:32:29.6603783Z       Alloc Alignment:         4KB                                
2025-12-04T10:32:29.6603984Z       Accessible by all:       FALSE                              
2025-12-04T10:32:29.6604112Z     Pool 2                   
2025-12-04T10:32:29.6604235Z       Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
2025-12-04T10:32:29.6604377Z       Size:                    268419072(0xfffc000) KB            
2025-12-04T10:32:29.6604514Z       Allocatable:             TRUE                               
2025-12-04T10:32:29.6604663Z       Alloc Granule:           4KB                                
2025-12-04T10:32:29.6604825Z       Alloc Recommended Granule:2048KB                             
2025-12-04T10:32:29.6604981Z       Alloc Alignment:         4KB                                
2025-12-04T10:32:29.6605268Z       Accessible by all:       FALSE                              
2025-12-04T10:32:29.6605581Z     Pool 3                   
2025-12-04T10:32:29.6605703Z       Segment:                 GLOBAL; FLAGS: FINE GRAINED        
2025-12-04T10:32:29.6605852Z       Size:                    268419072(0xfffc000) KB            
2025-12-04T10:32:29.6605992Z       Allocatable:             TRUE                               
2025-12-04T10:32:29.6606152Z       Alloc Granule:           4KB                                
2025-12-04T10:32:29.6606309Z       Alloc Recommended Granule:2048KB                             
2025-12-04T10:32:29.6606468Z       Alloc Alignment:         4KB                                
2025-12-04T10:32:29.6606627Z       Accessible by all:       FALSE                              
2025-12-04T10:32:29.6606763Z     Pool 4                   
2025-12-04T10:32:29.6606882Z       Segment:                 GROUP                              
2025-12-04T10:32:29.6607019Z       Size:                    64(0x40) KB                        
2025-12-04T10:32:29.6607161Z       Allocatable:             FALSE                              
2025-12-04T10:32:29.6607307Z       Alloc Granule:           0KB                                
2025-12-04T10:32:29.6607464Z       Alloc Recommended Granule:0KB                                
2025-12-04T10:32:29.6607616Z       Alloc Alignment:         0KB                                
2025-12-04T10:32:29.6607769Z       Accessible by all:       FALSE                              
2025-12-04T10:32:29.6607901Z   ISA Info:                
2025-12-04T10:32:29.6607996Z     ISA 1                    
2025-12-04T10:32:29.6608119Z       Name:                    amdgcn-amd-amdhsa--gfx942:sramecc+:xnack-
2025-12-04T10:32:29.6608312Z       Machine Models:          HSA_MACHINE_MODEL_LARGE            
2025-12-04T10:32:29.6608466Z       Profiles:                HSA_PROFILE_BASE                   
2025-12-04T10:32:29.6608620Z       Default Rounding Mode:   NEAR                               
2025-12-04T10:32:29.6608772Z       Default Rounding Mode:   NEAR                               
2025-12-04T10:32:29.6608920Z       Fast f16:                TRUE                               
2025-12-04T10:32:29.6609067Z       Workgroup Max Size:      1024(0x400)                        
2025-12-04T10:32:29.6609207Z       Workgroup Max Size per Dimension:
2025-12-04T10:32:29.6609335Z         x                        1024(0x400)                        
2025-12-04T10:32:29.6609462Z         y                        1024(0x400)                        
2025-12-04T10:32:29.6609637Z         z                        1024(0x400)                        
2025-12-04T10:32:29.6609776Z       Grid Max Size:           4294967295(0xffffffff)             
2025-12-04T10:32:29.6609908Z       Grid Max Size per Dimension:
2025-12-04T10:32:29.6610031Z         x                        4294967295(0xffffffff)             
2025-12-04T10:32:29.6610158Z         y                        4294967295(0xffffffff)             
2025-12-04T10:32:29.6610281Z         z                        4294967295(0xffffffff)             
2025-12-04T10:32:29.6610422Z       FBarrier Max Size:       32                                 
2025-12-04T10:32:29.6610555Z     ISA 2                    
2025-12-04T10:32:29.6610729Z       Name:                    amdgcn-amd-amdhsa--gfx9-4-generic:sramecc+:xnack-
2025-12-04T10:32:29.6610897Z       Machine Models:          HSA_MACHINE_MODEL_LARGE            
2025-12-04T10:32:29.6611050Z       Profiles:                HSA_PROFILE_BASE                   
2025-12-04T10:32:29.6611208Z       Default Rounding Mode:   NEAR                               
2025-12-04T10:32:29.6611388Z       Default Rounding Mode:   NEAR                               
2025-12-04T10:32:29.6611538Z       Fast f16:                TRUE                               
2025-12-04T10:32:29.6611687Z       Workgroup Max Size:      1024(0x400)                        
2025-12-04T10:32:29.6611829Z       Workgroup Max Size per Dimension:
2025-12-04T10:32:29.6611949Z         x                        1024(0x400)                        
2025-12-04T10:32:29.6612074Z         y                        1024(0x400)                        
2025-12-04T10:32:29.6612197Z         z                        1024(0x400)                        
2025-12-04T10:32:29.6612337Z       Grid Max Size:           4294967295(0xffffffff)             
2025-12-04T10:32:29.6612474Z       Grid Max Size per Dimension:
2025-12-04T10:32:29.6612588Z         x                        4294967295(0xffffffff)             
2025-12-04T10:32:29.6612716Z         y                        4294967295(0xffffffff)             
2025-12-04T10:32:29.6612845Z         z                        4294967295(0xffffffff)             
2025-12-04T10:32:29.6612983Z       FBarrier Max Size:       32                                 
2025-12-04T10:32:29.6613118Z *** Done ***             
2025-12-04T10:32:29.6623279Z ##[group]Run ngpu=$(rocminfo | grep -c -E 'Name:.*\sgfx')
2025-12-04T10:32:29.6623458Z [36;1mngpu=$(rocminfo | grep -c -E 'Name:.*\sgfx')[0m
2025-12-04T10:32:29.6623733Z [36;1mmsg="Please file an issue on pytorch/pytorch reporting the faulty runner. Include a link to the runner logs so the runner can be identified"[0m
2025-12-04T10:32:29.6623992Z [36;1mif [[ $ngpu -eq 0 ]]; then[0m
2025-12-04T10:32:29.6624144Z [36;1m    echo "Error: Failed to detect any GPUs on the runner"[0m
2025-12-04T10:32:29.6624288Z [36;1m    echo "$msg"[0m
2025-12-04T10:32:29.6624382Z [36;1m    exit 1[0m
2025-12-04T10:32:29.6624474Z [36;1mfi[0m
2025-12-04T10:32:29.6627353Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T10:32:29.6627494Z env:
2025-12-04T10:32:29.6627582Z   GIT_DEFAULT_BRANCH: main
2025-12-04T10:32:29.6627681Z ##[endgroup]
2025-12-04T10:32:29.7730121Z ##[group]Run pytorch/pytorch/.github/actions/diskspace-cleanup@main
2025-12-04T10:32:29.7730332Z with:
2025-12-04T10:32:29.7730450Z   diskspace-cutoff: 70
2025-12-04T10:32:29.7730573Z env:
2025-12-04T10:32:29.7730687Z   GIT_DEFAULT_BRANCH: main
2025-12-04T10:32:29.7730819Z ##[endgroup]
2025-12-04T10:32:29.7753418Z ##[group]Run set -ex
2025-12-04T10:32:29.7753553Z [36;1mset -ex[0m
2025-12-04T10:32:29.7753658Z [36;1mdiskspace_cutoff=70[0m
2025-12-04T10:32:29.7753805Z [36;1mdocker_root_dir=$(docker info -f '{{.DockerRootDir}}')[0m
2025-12-04T10:32:29.7753986Z [36;1mif [ ! -d "$docker_root_dir" ]; then[0m
2025-12-04T10:32:29.7754188Z [36;1m    echo "Docker root directory ($docker_root_dir) does not exist. Skipping disk space check."[0m
2025-12-04T10:32:29.7754374Z [36;1m    exit 0[0m
2025-12-04T10:32:29.7754473Z [36;1mfi[0m
2025-12-04T10:32:29.7754644Z [36;1mdiskspace=$(df -H --output=pcent ${docker_root_dir} | sed -n 2p | sed 's/%//' | sed 's/ //')[0m
2025-12-04T10:32:29.7754974Z [36;1mmsg="Please file an issue on pytorch/pytorch reporting the faulty runner. Include a link to the runner logs so the runner can be identified"[0m
2025-12-04T10:32:29.7755258Z [36;1mif [[ "$diskspace" -ge "$diskspace_cutoff" ]] ; then[0m
2025-12-04T10:32:29.7755411Z [36;1m    docker system prune -af[0m
2025-12-04T10:32:29.7755598Z [36;1m    diskspace_new=$(df -H --output=pcent ${docker_root_dir} | sed -n 2p | sed 's/%//' | sed 's/ //')[0m
2025-12-04T10:32:29.7755814Z [36;1m    if [[ "$diskspace_new" -gt "$diskspace_cutoff" ]] ; then[0m
2025-12-04T10:32:29.7756120Z [36;1m        diskspace_cutoff_int=$((diskspace_cutoff + 0))[0m
2025-12-04T10:32:29.7756277Z [36;1m        difference=$((100 - diskspace_cutoff_int))[0m
2025-12-04T10:32:29.7756487Z [36;1m        echo "Error: Available diskspace is less than $difference percent. Not enough diskspace."[0m
2025-12-04T10:32:29.7756677Z [36;1m        echo "$msg"[0m
2025-12-04T10:32:29.7756779Z [36;1m        exit 1[0m
2025-12-04T10:32:29.7756880Z [36;1m    else[0m
2025-12-04T10:32:29.7756996Z [36;1m        difference=$((diskspace - diskspace_new))[0m
2025-12-04T10:32:29.7757155Z [36;1m        echo "Diskspace saved: $difference percent"[0m
2025-12-04T10:32:29.7757289Z [36;1m    fi[0m
2025-12-04T10:32:29.7757374Z [36;1mfi[0m
2025-12-04T10:32:29.7760538Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T10:32:29.7760687Z env:
2025-12-04T10:32:29.7760775Z   GIT_DEFAULT_BRANCH: main
2025-12-04T10:32:29.7760882Z ##[endgroup]
2025-12-04T10:32:29.7777063Z + diskspace_cutoff=70
2025-12-04T10:32:29.7780181Z ++ docker info -f '{{.DockerRootDir}}'
2025-12-04T10:32:29.8128643Z + docker_root_dir=/home/runner/docker-data
2025-12-04T10:32:29.8130225Z + '[' '!' -d /home/runner/docker-data ']'
2025-12-04T10:32:29.8137142Z ++ df -H --output=pcent /home/runner/docker-data
2025-12-04T10:32:29.8137359Z ++ sed -n 2p
2025-12-04T10:32:29.8137484Z ++ sed s/%//
2025-12-04T10:32:29.8139438Z ++ sed 's/ //'
2025-12-04T10:32:29.8155005Z + diskspace=' 3'
2025-12-04T10:32:29.8156499Z + msg='Please file an issue on pytorch/pytorch reporting the faulty runner. Include a link to the runner logs so the runner can be identified'
2025-12-04T10:32:29.8157104Z + [[  3 -ge 70 ]]
2025-12-04T10:32:29.8185951Z ##[group]Run RUNNER_ARTIFACT_DIR="${RUNNER_TEMP}/artifacts"
2025-12-04T10:32:29.8186194Z [36;1mRUNNER_ARTIFACT_DIR="${RUNNER_TEMP}/artifacts"[0m
2025-12-04T10:32:29.8186355Z [36;1mrm -rf "${RUNNER_ARTIFACT_DIR}"[0m
2025-12-04T10:32:29.8186505Z [36;1mmkdir -p "${RUNNER_ARTIFACT_DIR}"[0m
2025-12-04T10:32:29.8186703Z [36;1mecho "RUNNER_ARTIFACT_DIR=${RUNNER_ARTIFACT_DIR}" >> "${GITHUB_ENV}"[0m
2025-12-04T10:32:29.8186883Z [36;1m[0m
2025-12-04T10:32:29.8187011Z [36;1mRUNNER_TEST_RESULTS_DIR="${RUNNER_TEMP}/test-results"[0m
2025-12-04T10:32:29.8187179Z [36;1mrm -rf "${RUNNER_TEST_RESULTS_DIR}"[0m
2025-12-04T10:32:29.8187327Z [36;1mmkdir -p "${RUNNER_TEST_RESULTS_DIR}"[0m
2025-12-04T10:32:29.8187517Z [36;1mecho "RUNNER_TEST_RESULTS_DIR=${RUNNER_TEST_RESULTS_DIR}" >> "${GITHUB_ENV}"[0m
2025-12-04T10:32:29.8187695Z [36;1m[0m
2025-12-04T10:32:29.8187975Z [36;1mRUNNER_DOCS_DIR="${RUNNER_TEMP}/docs"[0m
2025-12-04T10:32:29.8188112Z [36;1mrm -rf "${RUNNER_DOCS_DIR}"[0m
2025-12-04T10:32:29.8188244Z [36;1mmkdir -p "${RUNNER_DOCS_DIR}"[0m
2025-12-04T10:32:29.8188407Z [36;1mecho "RUNNER_DOCS_DIR=${RUNNER_DOCS_DIR}" >> "${GITHUB_ENV}"[0m
2025-12-04T10:32:29.8192991Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T10:32:29.8193137Z env:
2025-12-04T10:32:29.8193231Z   GIT_DEFAULT_BRANCH: main
2025-12-04T10:32:29.8193341Z ##[endgroup]
2025-12-04T10:32:29.8274922Z ##[group]Run env | grep '^GITHUB' >> "${RUNNER_TEMP}/github_env_${GITHUB_RUN_ID}"
2025-12-04T10:32:29.8275199Z [36;1menv | grep '^GITHUB' >> "${RUNNER_TEMP}/github_env_${GITHUB_RUN_ID}"[0m
2025-12-04T10:32:29.8275402Z [36;1menv | grep '^CI' >> "${RUNNER_TEMP}/github_env_${GITHUB_RUN_ID}"[0m
2025-12-04T10:32:29.8279723Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T10:32:29.8279879Z env:
2025-12-04T10:32:29.8280002Z   GIT_DEFAULT_BRANCH: main
2025-12-04T10:32:29.8280149Z   RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts
2025-12-04T10:32:29.8280329Z   RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results
2025-12-04T10:32:29.8280505Z   RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs
2025-12-04T10:32:29.8280648Z ##[endgroup]
2025-12-04T10:32:29.8323696Z ##[group]Run # All GPUs are visible to the runner; visibility, if needed, will be set by run_test.py.
2025-12-04T10:32:29.8324115Z [36;1m# All GPUs are visible to the runner; visibility, if needed, will be set by run_test.py.[0m
2025-12-04T10:32:29.8324324Z [36;1m# Add render group for container creation.[0m
2025-12-04T10:32:29.8324503Z [36;1mrender_gid=`cat /etc/group | grep render | cut -d: -f3`[0m
2025-12-04T10:32:29.8324712Z [36;1m# Ensure GPU isolation if pod is part of kubernetes setup with DEVICE_FLAG.[0m
2025-12-04T10:32:29.8324918Z [36;1mif [ -f "/etc/podinfo/gha-render-devices" ]; then[0m
2025-12-04T10:32:29.8325100Z [36;1m  DEVICE_FLAG=$(cat /etc/podinfo/gha-render-devices)[0m
2025-12-04T10:32:29.8325244Z [36;1melse[0m
2025-12-04T10:32:29.8325351Z [36;1m  DEVICE_FLAG="--device /dev/dri"[0m
2025-12-04T10:32:29.8325467Z [36;1mfi[0m
2025-12-04T10:32:29.8325652Z [36;1m# The --group-add daemon and --group-add bin are needed in the Ubuntu 24.04 and Almalinux OSs respectively.[0m
2025-12-04T10:32:29.8325934Z [36;1m# This is due to the device files (/dev/kfd & /dev/dri) being owned by video group on bare metal.[0m
2025-12-04T10:32:29.8326189Z [36;1m# This video group ID maps to subgid 1 inside the docker image due to the /etc/subgid entries.[0m
2025-12-04T10:32:29.8326458Z [36;1m# The group name corresponding to group ID 1 can change depending on the OS, so both are necessary.[0m
2025-12-04T10:32:29.8326906Z [36;1mecho "GPU_FLAG=--device=/dev/mem --device=/dev/kfd $DEVICE_FLAG --group-add video --group-add $render_gid --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host" >> "${GITHUB_ENV}"[0m
2025-12-04T10:32:29.8329931Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T10:32:29.8330073Z env:
2025-12-04T10:32:29.8330165Z   GIT_DEFAULT_BRANCH: main
2025-12-04T10:32:29.8330293Z   RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts
2025-12-04T10:32:29.8330472Z   RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results
2025-12-04T10:32:29.8330637Z   RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs
2025-12-04T10:32:29.8330768Z ##[endgroup]
2025-12-04T10:32:29.8401119Z ##[group]Run aws-actions/configure-aws-credentials@ececac1a45f3b08a01d2dd070d28d111c5fe6722
2025-12-04T10:32:29.8401317Z with:
2025-12-04T10:32:29.8401467Z   role-to-assume: arn:aws:iam::308535385114:role/gha_workflow_s3_and_ecr_read_only
2025-12-04T10:32:29.8401638Z   aws-region: us-east-1
2025-12-04T10:32:29.8401754Z   role-duration-seconds: 18000
2025-12-04T10:32:29.8401879Z   audience: sts.amazonaws.com
2025-12-04T10:32:29.8401988Z env:
2025-12-04T10:32:29.8402083Z   GIT_DEFAULT_BRANCH: main
2025-12-04T10:32:29.8402321Z   RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts
2025-12-04T10:32:29.8402495Z   RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results
2025-12-04T10:32:29.8402659Z   RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs
2025-12-04T10:32:29.8403159Z   GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host
2025-12-04T10:32:29.8403649Z ##[endgroup]
2025-12-04T10:32:30.1515822Z Assuming role with OIDC
2025-12-04T10:32:30.5150809Z Authenticated as assumedRoleId AROAUPVRELQNLLCOPFEJR:GitHubActions
2025-12-04T10:32:30.6112937Z ##[group]Run aws-actions/amazon-ecr-login@062b18b96a7aff071d4dc91bc00c4c1a7945b076
2025-12-04T10:32:30.6113158Z with:
2025-12-04T10:32:30.6113265Z   mask-password: true
2025-12-04T10:32:30.6113397Z   registry-type: private
2025-12-04T10:32:30.6113523Z   skip-logout: false
2025-12-04T10:32:30.6113635Z env:
2025-12-04T10:32:30.6113743Z   GIT_DEFAULT_BRANCH: main
2025-12-04T10:32:30.6113890Z   RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts
2025-12-04T10:32:30.6114078Z   RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results
2025-12-04T10:32:30.6114258Z   RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs
2025-12-04T10:32:30.6114956Z   GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host
2025-12-04T10:32:30.6115467Z   AWS_DEFAULT_REGION: us-east-1
2025-12-04T10:32:30.6115594Z   AWS_REGION: us-east-1
2025-12-04T10:32:30.6116030Z   AWS_ACCESS_KEY_ID: ***
2025-12-04T10:32:30.6116199Z   AWS_SECRET_ACCESS_KEY: ***
2025-12-04T10:32:30.6118268Z   AWS_SESSION_TOKEN: ***
2025-12-04T10:32:30.6118381Z ##[endgroup]
2025-12-04T10:32:31.0445875Z Logging into registry 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T10:32:31.6974061Z ##[group]Run env | grep '^GITHUB' >> "${RUNNER_TEMP}/github_env_${GITHUB_RUN_ID}"
2025-12-04T10:32:31.6974377Z [36;1menv | grep '^GITHUB' >> "${RUNNER_TEMP}/github_env_${GITHUB_RUN_ID}"[0m
2025-12-04T10:32:31.6974640Z [36;1menv | grep '^CI' >> "${RUNNER_TEMP}/github_env_${GITHUB_RUN_ID}"[0m
2025-12-04T10:32:31.6974905Z [36;1menv | grep '^RUNNER' >> "${RUNNER_TEMP}/github_env_${GITHUB_RUN_ID}"[0m
2025-12-04T10:32:31.6980763Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T10:32:31.6980952Z env:
2025-12-04T10:32:31.6981081Z   GIT_DEFAULT_BRANCH: main
2025-12-04T10:32:31.6981262Z   RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts
2025-12-04T10:32:31.6981495Z   RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results
2025-12-04T10:32:31.6981713Z   RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs
2025-12-04T10:32:31.6982389Z   GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host
2025-12-04T10:32:31.6982982Z   AWS_DEFAULT_REGION: us-east-1
2025-12-04T10:32:31.6983113Z   AWS_REGION: us-east-1
2025-12-04T10:32:31.6983416Z   AWS_ACCESS_KEY_ID: ***
2025-12-04T10:32:31.6983584Z   AWS_SECRET_ACCESS_KEY: ***
2025-12-04T10:32:31.6985742Z   AWS_SESSION_TOKEN: ***
2025-12-04T10:32:31.6985859Z ##[endgroup]
2025-12-04T10:32:31.7089552Z ##[group]Run ngpu=$(rocminfo | grep -c -E 'Name:.*\sgfx')
2025-12-04T10:32:31.7089966Z [36;1mngpu=$(rocminfo | grep -c -E 'Name:.*\sgfx')[0m
2025-12-04T10:32:31.7090217Z [36;1mif [[ $ngpu -lt 2 ]]; then #We are temporarily reducing this down to 2 from 4 so that we can run tests on nodes with less gpus.[0m
2025-12-04T10:32:31.7090508Z [36;1m  echo "Error: only $ngpu GPU(s) detected, at least 2 GPUs are needed for distributed jobs"[0m
2025-12-04T10:32:31.7090696Z [36;1m  exit 1[0m
2025-12-04T10:32:31.7090796Z [36;1mfi[0m
2025-12-04T10:32:31.7095107Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T10:32:31.7095259Z env:
2025-12-04T10:32:31.7095359Z   GIT_DEFAULT_BRANCH: main
2025-12-04T10:32:31.7095501Z   RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts
2025-12-04T10:32:31.7095699Z   RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results
2025-12-04T10:32:31.7095876Z   RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs
2025-12-04T10:32:31.7096410Z   GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host
2025-12-04T10:32:31.7096908Z   AWS_DEFAULT_REGION: us-east-1
2025-12-04T10:32:31.7097030Z   AWS_REGION: us-east-1
2025-12-04T10:32:31.7097322Z   AWS_ACCESS_KEY_ID: ***
2025-12-04T10:32:31.7097484Z   AWS_SECRET_ACCESS_KEY: ***
2025-12-04T10:32:31.7099499Z   AWS_SESSION_TOKEN: ***
2025-12-04T10:32:31.7099651Z ##[endgroup]
2025-12-04T10:32:31.8237598Z ##[group]Run pytorch/test-infra/.github/actions/calculate-docker-image@main
2025-12-04T10:32:31.8237812Z with:
2025-12-04T10:32:31.8238310Z   docker-image-name: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-rocm-n-py3-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T10:32:31.8238648Z   use-custom-docker-registry: true
2025-12-04T10:32:31.8238796Z   docker-build-dir: .ci/docker
2025-12-04T10:32:31.8238939Z   docker-build-script: ./build.sh
2025-12-04T10:32:31.8239076Z   working-directory: .
2025-12-04T10:32:31.8239236Z   docker-registry: 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T10:32:31.8239415Z   force-push: false
2025-12-04T10:32:31.8239529Z env:
2025-12-04T10:32:31.8239694Z   GIT_DEFAULT_BRANCH: main
2025-12-04T10:32:31.8239852Z   RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts
2025-12-04T10:32:31.8240046Z   RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results
2025-12-04T10:32:31.8240256Z   RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs
2025-12-04T10:32:31.8240819Z   GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host
2025-12-04T10:32:31.8241370Z   AWS_DEFAULT_REGION: us-east-1
2025-12-04T10:32:31.8241500Z   AWS_REGION: us-east-1
2025-12-04T10:32:31.8241776Z   AWS_ACCESS_KEY_ID: ***
2025-12-04T10:32:31.8241947Z   AWS_SECRET_ACCESS_KEY: ***
2025-12-04T10:32:31.8244132Z   AWS_SESSION_TOKEN: ***
2025-12-04T10:32:31.8244257Z ##[endgroup]
2025-12-04T10:32:31.8255079Z ##[group]Run set -ex
2025-12-04T10:32:31.8255221Z [36;1mset -ex[0m
2025-12-04T10:32:31.8255317Z [36;1m[0m
2025-12-04T10:32:31.8255473Z [36;1m# If the docker build directory or the build script doesn't exist, the action will[0m
2025-12-04T10:32:31.8255718Z [36;1m# gracefully return the docker image name as it is.  Pulling docker image in Linux[0m
2025-12-04T10:32:31.8255929Z [36;1m# job could then download the pre-built image as usual[0m
2025-12-04T10:32:31.8256187Z [36;1mif [[ -d "${DOCKER_BUILD_DIR}" ]] && [[ -f "${DOCKER_BUILD_DIR}/${DOCKER_BUILD_SCRIPT}" ]] && [[ "${USE_CUSTOM_DOCKER_REGISTRY}" == "true" ]]; then[0m
2025-12-04T10:32:31.8256423Z [36;1m  echo "skip=false" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T10:32:31.8256557Z [36;1melse[0m
2025-12-04T10:32:31.8256668Z [36;1m  echo "skip=true" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T10:32:31.8256843Z [36;1m  echo "docker-image=${DOCKER_IMAGE_NAME}" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T10:32:31.8257002Z [36;1m[0m
2025-12-04T10:32:31.8257206Z [36;1m  echo "Not using custom ECR registry.  Either it was not requested or there is no Docker build script in the ${REPO_NAME} repo..."[0m
2025-12-04T10:32:31.8257438Z [36;1m  exit 0[0m
2025-12-04T10:32:31.8257533Z [36;1mfi[0m
2025-12-04T10:32:31.8257626Z [36;1m[0m
2025-12-04T10:32:31.8257764Z [36;1mif [[ "${DOCKER_IMAGE_NAME}" == *"${DOCKER_REGISTRY}/${REPO_NAME}"* ]]; then[0m
2025-12-04T10:32:31.8257992Z [36;1m  # The docker image name already includes the ECR prefix and tag, so we can just[0m
2025-12-04T10:32:31.8258194Z [36;1m  # use it as it is, but first let's extract the tag[0m
2025-12-04T10:32:31.8258383Z [36;1m  DOCKER_TAG=$(echo "${DOCKER_IMAGE_NAME}" | awk -F '[:,]' '{print $2}')[0m
2025-12-04T10:32:31.8258574Z [36;1m  echo "docker-tag=${DOCKER_TAG}" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T10:32:31.8258759Z [36;1m  echo "docker-image=${DOCKER_IMAGE_NAME}" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T10:32:31.8258915Z [36;1melse[0m
2025-12-04T10:32:31.8259027Z [36;1m  if [[ "${DOCKER_IMAGE_NAME}" == *:* ]]; then[0m
2025-12-04T10:32:31.8259181Z [36;1m    CUSTOM_TAG_PREFIX=${DOCKER_IMAGE_NAME#*:}[0m
2025-12-04T10:32:31.8259334Z [36;1m    DOCKER_IMAGE_NAME=${DOCKER_IMAGE_NAME%%:*}[0m
2025-12-04T10:32:31.8259462Z [36;1m  fi[0m
2025-12-04T10:32:31.8259930Z [36;1m  DOCKER_TAG=${CUSTOM_TAG_PREFIX:+${CUSTOM_TAG_PREFIX}-}$(git rev-parse HEAD:"${DOCKER_BUILD_DIR}")[0m
2025-12-04T10:32:31.8260159Z [36;1m  echo "docker-tag=${DOCKER_TAG}" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T10:32:31.8260439Z [36;1m  echo "docker-image=${DOCKER_REGISTRY}/${REPO_NAME}/${DOCKER_IMAGE_NAME}:${DOCKER_TAG}" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T10:32:31.8260691Z [36;1m  echo "custom-tag-prefix=${CUSTOM_TAG_PREFIX}" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T10:32:31.8260851Z [36;1mfi[0m
2025-12-04T10:32:31.8265071Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T10:32:31.8265217Z env:
2025-12-04T10:32:31.8265309Z   GIT_DEFAULT_BRANCH: main
2025-12-04T10:32:31.8265450Z   RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts
2025-12-04T10:32:31.8265629Z   RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results
2025-12-04T10:32:31.8265798Z   RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs
2025-12-04T10:32:31.8266310Z   GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host
2025-12-04T10:32:31.8266804Z   AWS_DEFAULT_REGION: us-east-1
2025-12-04T10:32:31.8266923Z   AWS_REGION: us-east-1
2025-12-04T10:32:31.8267071Z   AWS_ACCESS_KEY_ID: ***
2025-12-04T10:32:31.8267229Z   AWS_SECRET_ACCESS_KEY: ***
2025-12-04T10:32:31.8269395Z   AWS_SESSION_TOKEN: ***
2025-12-04T10:32:31.8269507Z   REPO_NAME: pytorch
2025-12-04T10:32:31.8269845Z   DOCKER_IMAGE_NAME: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-rocm-n-py3-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T10:32:31.8270138Z   DOCKER_BUILD_DIR: .ci/docker
2025-12-04T10:32:31.8270259Z   DOCKER_BUILD_SCRIPT: ./build.sh
2025-12-04T10:32:31.8270411Z   DOCKER_REGISTRY: 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T10:32:31.8270575Z   USE_CUSTOM_DOCKER_REGISTRY: true
2025-12-04T10:32:31.8270694Z   CUSTOM_TAG_PREFIX: 
2025-12-04T10:32:31.8270797Z ##[endgroup]
2025-12-04T10:32:31.8288893Z + [[ -d .ci/docker ]]
2025-12-04T10:32:31.8289012Z + [[ -f .ci/docker/./build.sh ]]
2025-12-04T10:32:31.8289136Z + [[ true == \t\r\u\e ]]
2025-12-04T10:32:31.8289241Z + echo skip=false
2025-12-04T10:32:31.8289649Z + [[ 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-rocm-n-py3-f0cd68561080d537ef3d3d6f81b25a6416ad600a == *\3\0\8\5\3\5\3\8\5\1\1\4\.\d\k\r\.\e\c\r\.\u\s\-\e\a\s\t\-\1\.\a\m\a\z\o\n\a\w\s\.\c\o\m\/\p\y\t\o\r\c\h* ]]
2025-12-04T10:32:31.8295017Z ++ echo 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-rocm-n-py3-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T10:32:31.8296472Z ++ awk -F '[:,]' '{print $2}'
2025-12-04T10:32:31.8307401Z + DOCKER_TAG=pytorch-linux-jammy-rocm-n-py3-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T10:32:31.8307684Z + echo docker-tag=pytorch-linux-jammy-rocm-n-py3-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T10:32:31.8308119Z + echo docker-image=308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-rocm-n-py3-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T10:32:31.8347599Z ##[group]Run set +e
2025-12-04T10:32:31.8347750Z [36;1mset +e[0m
2025-12-04T10:32:31.8347853Z [36;1mset -x[0m
2025-12-04T10:32:31.8347951Z [36;1m[0m
2025-12-04T10:32:31.8348045Z [36;1mlogin() {[0m
2025-12-04T10:32:31.8348252Z [36;1m  aws ecr get-login-password --region us-east-1 | docker login -u AWS --password-stdin "$1"[0m
2025-12-04T10:32:31.8348455Z [36;1m}[0m
2025-12-04T10:32:31.8348547Z [36;1m[0m
2025-12-04T10:32:31.8348642Z [36;1mretry () {[0m
2025-12-04T10:32:31.8348765Z [36;1m  $*  || (sleep 1 && $*) || (sleep 2 && $*)[0m
2025-12-04T10:32:31.8348897Z [36;1m}[0m
2025-12-04T10:32:31.8348990Z [36;1m[0m
2025-12-04T10:32:31.8349092Z [36;1mretry login "${DOCKER_REGISTRY}"[0m
2025-12-04T10:32:31.8349218Z [36;1m[0m
2025-12-04T10:32:31.8349464Z [36;1mSTART_TIME=$(date +%s)[0m
2025-12-04T10:32:31.8349746Z [36;1m# Wait up to 120 minutes[0m
2025-12-04T10:32:31.8350047Z [36;1mwhile [[ $(( $(date +%s) - 7200 )) -lt $START_TIME ]]; do[0m
2025-12-04T10:32:31.8350246Z [36;1m  # Check if image already exists, if it does then skip building it[0m
2025-12-04T10:32:31.8350446Z [36;1m  if docker manifest inspect "${DOCKER_IMAGE}"; then[0m
2025-12-04T10:32:31.8350594Z [36;1m    exit 0[0m
2025-12-04T10:32:31.8350697Z [36;1m  fi[0m
2025-12-04T10:32:31.8350793Z [36;1m[0m
2025-12-04T10:32:31.8350951Z [36;1m  # NB: This flag is used by Docker build workflow to push the image to ECR, so we can[0m
2025-12-04T10:32:31.8351208Z [36;1m  # use this to differentiate between the Docker build and regular build jobs. For the[0m
2025-12-04T10:32:31.8351462Z [36;1m  # latter, it will wait for the Docker images to become available before continuing[0m
2025-12-04T10:32:31.8351670Z [36;1m  if [ "${DOCKER_PUSH:-false}" == "true" ]; then[0m
2025-12-04T10:32:31.8351844Z [36;1m    # It's a Docker build job, let's build the image[0m
2025-12-04T10:32:31.8351984Z [36;1m    break[0m
2025-12-04T10:32:31.8352093Z [36;1m  else[0m
2025-12-04T10:32:31.8352232Z [36;1m    # It's a regular build job, wait for the image to become available[0m
2025-12-04T10:32:31.8352392Z [36;1m    sleep 300[0m
2025-12-04T10:32:31.8352498Z [36;1m  fi[0m
2025-12-04T10:32:31.8352592Z [36;1mdone[0m
2025-12-04T10:32:31.8352685Z [36;1m[0m
2025-12-04T10:32:31.8352827Z [36;1m# NB: This part requires a full checkout. Otherwise, the merge base will[0m
2025-12-04T10:32:31.8353040Z [36;1m# be empty.  The default action would be to continue rebuild the image[0m
2025-12-04T10:32:31.8353235Z [36;1mif [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then[0m
2025-12-04T10:32:31.8353412Z [36;1m  # if we're on the base branch then use the parent commit[0m
2025-12-04T10:32:31.8353569Z [36;1m  MERGE_BASE=$(git rev-parse HEAD~)[0m
2025-12-04T10:32:31.8353698Z [36;1melse[0m
2025-12-04T10:32:31.8353828Z [36;1m  # otherwise we're on a PR, so use the most recent base commit[0m
2025-12-04T10:32:31.8354009Z [36;1m  MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION")[0m
2025-12-04T10:32:31.8354155Z [36;1mfi[0m
2025-12-04T10:32:31.8354244Z [36;1m[0m
2025-12-04T10:32:31.8354344Z [36;1mif [[ -z "${MERGE_BASE}" ]]; then[0m
2025-12-04T10:32:31.8354491Z [36;1m  echo "rebuild=true" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T10:32:31.8354623Z [36;1m[0m
2025-12-04T10:32:31.8354801Z [36;1m  echo "Finding merge base only works with full checkout, please set fetch-depth to 0, continuing ..."[0m
2025-12-04T10:32:31.8355003Z [36;1m  exit 0[0m
2025-12-04T10:32:31.8355099Z [36;1mfi[0m
2025-12-04T10:32:31.8355189Z [36;1m[0m
2025-12-04T10:32:31.8355316Z [36;1mif ! git rev-parse "${MERGE_BASE}:${DOCKER_BUILD_DIR}"; then[0m
2025-12-04T10:32:31.8355568Z [36;1m  echo "Directory '${DOCKER_BUILD_DIR}' not found in commit $MERGE_BASE, you should rebase onto a more recent commit"[0m
2025-12-04T10:32:31.8355791Z [36;1m  exit 1[0m
2025-12-04T10:32:31.8355887Z [36;1mfi[0m
2025-12-04T10:32:31.8355978Z [36;1m[0m
2025-12-04T10:32:31.8356126Z [36;1mPREVIOUS_DOCKER_TAG=$(git rev-parse "${MERGE_BASE}:${DOCKER_BUILD_DIR}")[0m
2025-12-04T10:32:31.8356370Z [36;1m# If no image exists but the hash is the same as the previous hash then we should error out here[0m
2025-12-04T10:32:31.8356590Z [36;1mif [[ "${PREVIOUS_DOCKER_TAG}" == "${DOCKER_TAG}" ]]; then[0m
2025-12-04T10:32:31.8356842Z [36;1m  echo "WARNING: Something has gone wrong and the previous image isn't available for the merge-base of your branch"[0m
2025-12-04T10:32:31.8357119Z [36;1m  echo "         Will re-build docker image to store in local cache, TTS may be longer"[0m
2025-12-04T10:32:31.8357291Z [36;1mfi[0m
2025-12-04T10:32:31.8357383Z [36;1m[0m
2025-12-04T10:32:31.8357495Z [36;1mecho "rebuild=true" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T10:32:31.8360608Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T10:32:31.8360805Z env:
2025-12-04T10:32:31.8360904Z   GIT_DEFAULT_BRANCH: main
2025-12-04T10:32:31.8361048Z   RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts
2025-12-04T10:32:31.8361283Z   RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results
2025-12-04T10:32:31.8361452Z   RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs
2025-12-04T10:32:31.8361963Z   GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host
2025-12-04T10:32:31.8362468Z   AWS_DEFAULT_REGION: us-east-1
2025-12-04T10:32:31.8362590Z   AWS_REGION: us-east-1
2025-12-04T10:32:31.8362751Z   AWS_ACCESS_KEY_ID: ***
2025-12-04T10:32:31.8362910Z   AWS_SECRET_ACCESS_KEY: ***
2025-12-04T10:32:31.8364905Z   AWS_SESSION_TOKEN: ***
2025-12-04T10:32:31.8365023Z   DOCKER_BUILD_DIR: .ci/docker
2025-12-04T10:32:31.8365166Z   BASE_REVISION: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T10:32:31.8365480Z   DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-rocm-n-py3-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T10:32:31.8365843Z   DOCKER_TAG: pytorch-linux-jammy-rocm-n-py3-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T10:32:31.8366073Z   DOCKER_REGISTRY: 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T10:32:31.8366225Z   DOCKER_PUSH: 
2025-12-04T10:32:31.8366323Z ##[endgroup]
2025-12-04T10:32:31.8385623Z + retry login 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T10:32:31.8385797Z + login 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T10:32:31.8388724Z + aws ecr get-login-password --region us-east-1
2025-12-04T10:32:31.8388930Z + docker login -u AWS --password-stdin 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T10:32:31.8389206Z /home/runner/_work/_temp/f1912837-f3f9-4b32-8dc8-31249691bcf9.sh: line 5: aws: command not found
2025-12-04T10:32:31.8465480Z Error: Cannot perform an interactive login from a non TTY device
2025-12-04T10:32:31.8474178Z + sleep 1
2025-12-04T10:32:32.8484958Z + login 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T10:32:32.8488181Z + aws ecr get-login-password --region us-east-1
2025-12-04T10:32:32.8488585Z /home/runner/_work/_temp/f1912837-f3f9-4b32-8dc8-31249691bcf9.sh: line 5: aws: command not found
2025-12-04T10:32:32.8489252Z + docker login -u AWS --password-stdin 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T10:32:32.8582014Z Error: Cannot perform an interactive login from a non TTY device
2025-12-04T10:32:32.8592693Z + sleep 2
2025-12-04T10:32:34.8604926Z + login 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T10:32:34.8608808Z + aws ecr get-login-password --region us-east-1
2025-12-04T10:32:34.8609323Z /home/runner/_work/_temp/f1912837-f3f9-4b32-8dc8-31249691bcf9.sh: line 5: aws: command not found
2025-12-04T10:32:34.8609997Z + docker login -u AWS --password-stdin 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T10:32:34.8705173Z Error: Cannot perform an interactive login from a non TTY device
2025-12-04T10:32:34.8718116Z ++ date +%s
2025-12-04T10:32:34.8725210Z + START_TIME=1764844354
2025-12-04T10:32:34.8729405Z ++ date +%s
2025-12-04T10:32:34.8739645Z + [[ 1764837154 -lt 1764844354 ]]
2025-12-04T10:32:34.8740133Z + docker manifest inspect 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-rocm-n-py3-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T10:32:36.2737186Z {
2025-12-04T10:32:36.2737437Z 	"schemaVersion": 2,
2025-12-04T10:32:36.2737769Z 	"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
2025-12-04T10:32:36.2738161Z 	"config": {
2025-12-04T10:32:36.2738405Z 		"mediaType": "application/vnd.docker.container.image.v1+json",
2025-12-04T10:32:36.2738674Z 		"size": 30520,
2025-12-04T10:32:36.2738977Z 		"digest": "sha256:45252333063339f104d56e41f20304e9511ab21c7768e8d156b95ddf24a9dbe5"
2025-12-04T10:32:36.2739995Z 	},
2025-12-04T10:32:36.2740133Z 	"layers": [
2025-12-04T10:32:36.2740282Z 		{
2025-12-04T10:32:36.2740508Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2740935Z 			"size": 30447951,
2025-12-04T10:32:36.2741211Z 			"digest": "sha256:63e5bc7682b85ae57a1221210f64d62e7a90b0a30f19af4ca734b8242ae49d63"
2025-12-04T10:32:36.2741506Z 		},
2025-12-04T10:32:36.2741644Z 		{
2025-12-04T10:32:36.2741862Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2742127Z 			"size": 1554,
2025-12-04T10:32:36.2742397Z 			"digest": "sha256:835841cca3b7e1464290cdb78e48773e03583413fbed852c3cc5165a392ea44d"
2025-12-04T10:32:36.2742686Z 		},
2025-12-04T10:32:36.2742817Z 		{
2025-12-04T10:32:36.2743031Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2743294Z 			"size": 313275691,
2025-12-04T10:32:36.2743573Z 			"digest": "sha256:aac69780afc8611a5f94a235792d39ae055249c8319ef43b78675998a9b2f825"
2025-12-04T10:32:36.2743862Z 		},
2025-12-04T10:32:36.2743995Z 		{
2025-12-04T10:32:36.2744210Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2744476Z 			"size": 704,
2025-12-04T10:32:36.2744750Z 			"digest": "sha256:029495b23122c840ca0e52d487afa8d2c4dbf1991cd7f204ec3e434dcf947bf4"
2025-12-04T10:32:36.2745055Z 		},
2025-12-04T10:32:36.2745190Z 		{
2025-12-04T10:32:36.2745396Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2745657Z 			"size": 1218,
2025-12-04T10:32:36.2745953Z 			"digest": "sha256:d0fb85b008332051a3f7c052721ef68bde404b46c23fa43ad040373bd367826c"
2025-12-04T10:32:36.2746244Z 		},
2025-12-04T10:32:36.2746370Z 		{
2025-12-04T10:32:36.2746580Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2746846Z 			"size": 484,
2025-12-04T10:32:36.2747135Z 			"digest": "sha256:59b63930883363c7d2aaab27cc61555d9f3e119dc18247a8624c98ebdaa354a5"
2025-12-04T10:32:36.2747367Z 		},
2025-12-04T10:32:36.2747480Z 		{
2025-12-04T10:32:36.2747654Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2747877Z 			"size": 110363202,
2025-12-04T10:32:36.2748106Z 			"digest": "sha256:dc112c89d57aa1e85082e40a56e5bc743d64f834ae2f98afe91f60c248354d38"
2025-12-04T10:32:36.2748337Z 		},
2025-12-04T10:32:36.2748453Z 		{
2025-12-04T10:32:36.2748620Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2748826Z 			"size": 4436,
2025-12-04T10:32:36.2749036Z 			"digest": "sha256:522eab2402e5001810155ef7eb56940b7c01a4fef62ac588886981c3b8ee8e1e"
2025-12-04T10:32:36.2749272Z 		},
2025-12-04T10:32:36.2749377Z 		{
2025-12-04T10:32:36.2749547Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2749827Z 			"size": 1755,
2025-12-04T10:32:36.2750050Z 			"digest": "sha256:2b5a11b41761d8ea3b829e4772e4064cb6c4e4989126af324d0057661e4493a1"
2025-12-04T10:32:36.2750282Z 		},
2025-12-04T10:32:36.2750388Z 		{
2025-12-04T10:32:36.2750567Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2750779Z 			"size": 724,
2025-12-04T10:32:36.2750993Z 			"digest": "sha256:9681563a88ff9e62494a2740e537440d3df978d466c9478d6a941fae8b57b084"
2025-12-04T10:32:36.2751235Z 		},
2025-12-04T10:32:36.2751340Z 		{
2025-12-04T10:32:36.2751506Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2751727Z 			"size": 3185588166,
2025-12-04T10:32:36.2751951Z 			"digest": "sha256:73e33534e9eb94cf29418d65944168962b65fe21f55e9b8bad18c76e9b3a37b8"
2025-12-04T10:32:36.2752188Z 		},
2025-12-04T10:32:36.2752304Z 		{
2025-12-04T10:32:36.2752469Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2752676Z 			"size": 396,
2025-12-04T10:32:36.2752895Z 			"digest": "sha256:5bfdaeb5578d6ffcd7db29c48303cbceb13c591210feaa216a8daa7a6d445b4b"
2025-12-04T10:32:36.2753137Z 		},
2025-12-04T10:32:36.2753241Z 		{
2025-12-04T10:32:36.2753480Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2753690Z 			"size": 236863,
2025-12-04T10:32:36.2753909Z 			"digest": "sha256:c07d27e4d3a5ba4ad5325bb785b2e4f058fe5e10ec1aeeb413a1e152b073f203"
2025-12-04T10:32:36.2754201Z 		},
2025-12-04T10:32:36.2754310Z 		{
2025-12-04T10:32:36.2754474Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2754684Z 			"size": 787,
2025-12-04T10:32:36.2754901Z 			"digest": "sha256:b21856d1bf420da6fa8ec7331b82ab355d4f4178644e7d3a3d3d0fbc3610109a"
2025-12-04T10:32:36.2755140Z 		},
2025-12-04T10:32:36.2755249Z 		{
2025-12-04T10:32:36.2755429Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2755639Z 			"size": 106,
2025-12-04T10:32:36.2755855Z 			"digest": "sha256:cb19d84867e4063f55db9459c28c50a2abc37c06d3c1ca82ba95fa8427cc438a"
2025-12-04T10:32:36.2756095Z 		},
2025-12-04T10:32:36.2756203Z 		{
2025-12-04T10:32:36.2756371Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2756591Z 			"size": 1496,
2025-12-04T10:32:36.2756815Z 			"digest": "sha256:8165374f8dccf88a7791a5d31afbe29e4d4542b4f1cf1904945e07f9af6bf8ba"
2025-12-04T10:32:36.2757050Z 		},
2025-12-04T10:32:36.2757139Z 		{
2025-12-04T10:32:36.2757273Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2757439Z 			"size": 458789560,
2025-12-04T10:32:36.2757629Z 			"digest": "sha256:1aecc77354ceba59ec6f0d37a558f2dbb6d5c0854553ee8505ac8707b422da6d"
2025-12-04T10:32:36.2757817Z 		},
2025-12-04T10:32:36.2757905Z 		{
2025-12-04T10:32:36.2758039Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2758204Z 			"size": 164,
2025-12-04T10:32:36.2758371Z 			"digest": "sha256:465d3fd643aa2ea0ad07335cda66f12f1d7e5e800c4e9385ec466bc8a1ceabda"
2025-12-04T10:32:36.2758565Z 		},
2025-12-04T10:32:36.2758653Z 		{
2025-12-04T10:32:36.2758783Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2758994Z 			"size": 104,
2025-12-04T10:32:36.2759341Z 			"digest": "sha256:6c503e779d6f41ca7f51309875df2b725c171926aece7009c4b8a64d1ba3f58e"
2025-12-04T10:32:36.2759686Z 		},
2025-12-04T10:32:36.2759781Z 		{
2025-12-04T10:32:36.2759918Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2760094Z 			"size": 724,
2025-12-04T10:32:36.2760266Z 			"digest": "sha256:9681563a88ff9e62494a2740e537440d3df978d466c9478d6a941fae8b57b084"
2025-12-04T10:32:36.2760453Z 		},
2025-12-04T10:32:36.2760540Z 		{
2025-12-04T10:32:36.2760675Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2760849Z 			"size": 196,
2025-12-04T10:32:36.2761003Z + exit 0
2025-12-04T10:32:36.2761172Z 			"digest": "sha256:f7e9a021f0ee3d11a50dcb96378af8103a21f6c3c142f54529207648f3ed00b2"
2025-12-04T10:32:36.2761362Z 		},
2025-12-04T10:32:36.2761446Z 		{
2025-12-04T10:32:36.2761582Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2761750Z 			"size": 2583,
2025-12-04T10:32:36.2761915Z 			"digest": "sha256:8e023b349080fb11ee55491bc9b842b30e9e3a90246d05b303a73dc62038caf2"
2025-12-04T10:32:36.2762103Z 		},
2025-12-04T10:32:36.2762189Z 		{
2025-12-04T10:32:36.2762323Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2762496Z 			"size": 7577171420,
2025-12-04T10:32:36.2762670Z 			"digest": "sha256:8188df80e595a3dbcf84623c6a58a655269898cbb60029435f136d7f9d34ccaa"
2025-12-04T10:32:36.2762857Z 		},
2025-12-04T10:32:36.2762938Z 		{
2025-12-04T10:32:36.2763073Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2763242Z 			"size": 135,
2025-12-04T10:32:36.2763421Z 			"digest": "sha256:3c2c2f8c74bfa16c4bf9a832c97bbb1d55205b2b4a2cead02cf74301ca1001fb"
2025-12-04T10:32:36.2763609Z 		},
2025-12-04T10:32:36.2763696Z 		{
2025-12-04T10:32:36.2763832Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2763995Z 			"size": 104,
2025-12-04T10:32:36.2764219Z 			"digest": "sha256:2aa7784fbe3300f8bbfb6bb51cff3b01fd091e829c2bc7ab9e25261a0dd9b3bd"
2025-12-04T10:32:36.2764411Z 		},
2025-12-04T10:32:36.2764495Z 		{
2025-12-04T10:32:36.2764674Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2764838Z 			"size": 612,
2025-12-04T10:32:36.2765010Z 			"digest": "sha256:2b3b5215d3ebe8789f0444457bfd5a6e218289b64aa07653ac3d03ddda5e6708"
2025-12-04T10:32:36.2765195Z 		},
2025-12-04T10:32:36.2765281Z 		{
2025-12-04T10:32:36.2765418Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2765585Z 			"size": 838191945,
2025-12-04T10:32:36.2765764Z 			"digest": "sha256:99b1f1ea3e857834cebd01763d90fbd700aeb9c2d2ef23eda2cfff5652c9708b"
2025-12-04T10:32:36.2765954Z 		},
2025-12-04T10:32:36.2766041Z 		{
2025-12-04T10:32:36.2766170Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2766356Z 			"size": 111,
2025-12-04T10:32:36.2766528Z 			"digest": "sha256:18d6daba0a5768a37ad106b57974f6b7efd35c43a87c246bcd3f43fea88f2d2b"
2025-12-04T10:32:36.2766719Z 		},
2025-12-04T10:32:36.2766803Z 		{
2025-12-04T10:32:36.2766934Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2767087Z 			"size": 1555,
2025-12-04T10:32:36.2767243Z 			"digest": "sha256:5277f2a503ebd17ba9d9b86cc9bac86265504adeb449c0647616ddaacd3cbc41"
2025-12-04T10:32:36.2767414Z 		},
2025-12-04T10:32:36.2767489Z 		{
2025-12-04T10:32:36.2767610Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2767760Z 			"size": 107,
2025-12-04T10:32:36.2767911Z 			"digest": "sha256:3198a9717aace920fd5de085319adf75091af05fc4318ce4b16a8a5b0e8d449e"
2025-12-04T10:32:36.2768084Z 		},
2025-12-04T10:32:36.2768160Z 		{
2025-12-04T10:32:36.2768280Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2768430Z 			"size": 166,
2025-12-04T10:32:36.2768580Z 			"digest": "sha256:99a4918e5808277879449e97ccd7190db6b9aa2d742b57a3b831ce0198522bdd"
2025-12-04T10:32:36.2768750Z 		},
2025-12-04T10:32:36.2768830Z 		{
2025-12-04T10:32:36.2768950Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2769102Z 			"size": 3526081,
2025-12-04T10:32:36.2769259Z 			"digest": "sha256:15bb11dfc6acc3537d527d6771c8e711e5605e99f82ec41e805d4600b8a97516"
2025-12-04T10:32:36.2769428Z 		},
2025-12-04T10:32:36.2769507Z 		{
2025-12-04T10:32:36.2769687Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2769836Z 			"size": 107,
2025-12-04T10:32:36.2769990Z 			"digest": "sha256:bd87c8766e90e33db17514558ac591cc3f4149afd7abeaef4dd5770bbfa14210"
2025-12-04T10:32:36.2770161Z 		},
2025-12-04T10:32:36.2770238Z 		{
2025-12-04T10:32:36.2770360Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2770510Z 			"size": 829,
2025-12-04T10:32:36.2770661Z 			"digest": "sha256:1969e15d0c13874ea5883ed829235a19ef6dc21c8aa6172032b78a8ffa6ff262"
2025-12-04T10:32:36.2770830Z 		},
2025-12-04T10:32:36.2770908Z 		{
2025-12-04T10:32:36.2771028Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2771182Z 			"size": 26973054,
2025-12-04T10:32:36.2771341Z 			"digest": "sha256:24a03847d382b73c11969f8f73916a6bedf5ccea12f6f4290b3880f29ceda32a"
2025-12-04T10:32:36.2771508Z 		},
2025-12-04T10:32:36.2771585Z 		{
2025-12-04T10:32:36.2771706Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2771856Z 			"size": 104,
2025-12-04T10:32:36.2772009Z 			"digest": "sha256:816e2e34e01839a35d624dbf4bd9ac9bea4c975104af47a0e6b6b6dee6c6f98d"
2025-12-04T10:32:36.2772180Z 		},
2025-12-04T10:32:36.2772257Z 		{
2025-12-04T10:32:36.2772377Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2772527Z 			"size": 424,
2025-12-04T10:32:36.2772680Z 			"digest": "sha256:b168858b85373f8ddca549d79267a06de4fa945d04bf791c55c9ddc93957fa3c"
2025-12-04T10:32:36.2772847Z 		},
2025-12-04T10:32:36.2772924Z 		{
2025-12-04T10:32:36.2773091Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2773243Z 			"size": 19309386,
2025-12-04T10:32:36.2773452Z 			"digest": "sha256:6b8d5ff02e267e38322afbb8a58ed63ce9d75b10e9e73255e6affcbc6b6539bf"
2025-12-04T10:32:36.2773623Z 		},
2025-12-04T10:32:36.2773697Z 		{
2025-12-04T10:32:36.2773818Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2773969Z 			"size": 826,
2025-12-04T10:32:36.2774122Z 			"digest": "sha256:4e3b10a5dd6aed29f238d604925e2a4f873141c1087c8dd4fdde5c61e7560893"
2025-12-04T10:32:36.2774292Z 		},
2025-12-04T10:32:36.2774367Z 		{
2025-12-04T10:32:36.2774488Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2774638Z 			"size": 724,
2025-12-04T10:32:36.2774786Z 			"digest": "sha256:9681563a88ff9e62494a2740e537440d3df978d466c9478d6a941fae8b57b084"
2025-12-04T10:32:36.2774952Z 		},
2025-12-04T10:32:36.2775030Z 		{
2025-12-04T10:32:36.2775156Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2775307Z 			"size": 149,
2025-12-04T10:32:36.2775460Z 			"digest": "sha256:3092fab73b59190b9facfc49bf18f58612172bc2fd68dfa339a1118632616939"
2025-12-04T10:32:36.2775631Z 		},
2025-12-04T10:32:36.2775708Z 		{
2025-12-04T10:32:36.2775830Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2775979Z 			"size": 136,
2025-12-04T10:32:36.2776134Z 			"digest": "sha256:20020dd28a15ba092fcbfe906ee39cdddfcc9d0b7eb42fdd6f4c08a984fa9c00"
2025-12-04T10:32:36.2776308Z 		},
2025-12-04T10:32:36.2776383Z 		{
2025-12-04T10:32:36.2776503Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2776653Z 			"size": 140,
2025-12-04T10:32:36.2776806Z 			"digest": "sha256:ae5280ce969dcff08c091e9a5f7641f13561b2b0ee44d78b7c3f81d8fe8e6d32"
2025-12-04T10:32:36.2776977Z 		},
2025-12-04T10:32:36.2777054Z 		{
2025-12-04T10:32:36.2777175Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2777328Z 			"size": 32,
2025-12-04T10:32:36.2777482Z 			"digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"
2025-12-04T10:32:36.2777656Z 		},
2025-12-04T10:32:36.2777731Z 		{
2025-12-04T10:32:36.2777852Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2778002Z 			"size": 222,
2025-12-04T10:32:36.2778156Z 			"digest": "sha256:fe17d9eb0fd26d3af4c724bf570d833978b131cedb7dc17a800aa388a246b3cd"
2025-12-04T10:32:36.2778326Z 		},
2025-12-04T10:32:36.2778405Z 		{
2025-12-04T10:32:36.2778527Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2778678Z 			"size": 346,
2025-12-04T10:32:36.2778830Z 			"digest": "sha256:a51e0dab2d596e6563483f27c12660007160847d177ba4c31812a8f44ada5754"
2025-12-04T10:32:36.2778996Z 		},
2025-12-04T10:32:36.2779073Z 		{
2025-12-04T10:32:36.2779195Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2779348Z 			"size": 88300,
2025-12-04T10:32:36.2779511Z 			"digest": "sha256:6eb176cefd72d37ecbcdf074289a8f1de732d8816cc695ece7e4709d098094d6"
2025-12-04T10:32:36.2779723Z 		},
2025-12-04T10:32:36.2779803Z 		{
2025-12-04T10:32:36.2779923Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2780074Z 			"size": 106,
2025-12-04T10:32:36.2780228Z 			"digest": "sha256:e7b8cf2e8d5a4c56db9726ce62c1176032408b3b1c25a000592361cb4245e2b5"
2025-12-04T10:32:36.2780397Z 		},
2025-12-04T10:32:36.2780474Z 		{
2025-12-04T10:32:36.2780595Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2780746Z 			"size": 1671,
2025-12-04T10:32:36.2780902Z 			"digest": "sha256:ef3a5060abce88884bc8bd815aa41c46427f34eeb132fe0ddd85a3f86e6dc83d"
2025-12-04T10:32:36.2781073Z 		},
2025-12-04T10:32:36.2781149Z 		{
2025-12-04T10:32:36.2781271Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2781424Z 			"size": 724,
2025-12-04T10:32:36.2781620Z 			"digest": "sha256:9681563a88ff9e62494a2740e537440d3df978d466c9478d6a941fae8b57b084"
2025-12-04T10:32:36.2781789Z 		},
2025-12-04T10:32:36.2781865Z 		{
2025-12-04T10:32:36.2782022Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2782176Z 			"size": 138,
2025-12-04T10:32:36.2782332Z 			"digest": "sha256:a6f4ec14b42b8f0a83d20aa6a985ddb6a1bf64e0ed3d44afd3484b87d4ed5ad3"
2025-12-04T10:32:36.2782506Z 		},
2025-12-04T10:32:36.2782582Z 		{
2025-12-04T10:32:36.2782701Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2782849Z 			"size": 119,
2025-12-04T10:32:36.2783002Z 			"digest": "sha256:7e5a0c956cfbd6f8074fbfd3b1d416e6635d632835ec00c8dd4c015a21da19b4"
2025-12-04T10:32:36.2783172Z 		},
2025-12-04T10:32:36.2783247Z 		{
2025-12-04T10:32:36.2783370Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2783522Z 			"size": 6238423049,
2025-12-04T10:32:36.2783690Z 			"digest": "sha256:b4f78730cfe76ce091b78b2e2e3d52be03f1097b3e4c3de5bd79f8d13a853132"
2025-12-04T10:32:36.2783862Z 		},
2025-12-04T10:32:36.2783939Z 		{
2025-12-04T10:32:36.2784060Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2784214Z 			"size": 174,
2025-12-04T10:32:36.2784364Z 			"digest": "sha256:081028f24389b112683689fd362e8c0d6f358082710e72feab91cea6383feb4d"
2025-12-04T10:32:36.2784529Z 		},
2025-12-04T10:32:36.2784605Z 		{
2025-12-04T10:32:36.2784729Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2784879Z 			"size": 1896,
2025-12-04T10:32:36.2785037Z 			"digest": "sha256:a534dcf4b9a9e5fabed742c8a8fc43c9cfe7346ea88ab3c177c3b14fd3afe00a"
2025-12-04T10:32:36.2785210Z 		},
2025-12-04T10:32:36.2785286Z 		{
2025-12-04T10:32:36.2785407Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2785558Z 			"size": 197577597,
2025-12-04T10:32:36.2785715Z 			"digest": "sha256:2e77500302cc13224427e1d74e471bd79d5109ba6a5099a83df1d10b786f71ba"
2025-12-04T10:32:36.2785885Z 		},
2025-12-04T10:32:36.2785961Z 		{
2025-12-04T10:32:36.2786083Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2786238Z 			"size": 304,
2025-12-04T10:32:36.2786466Z 			"digest": "sha256:bc08246bb4ba18c3ec5bc69e16b6b4e929c5bd0f3fae10eeb0b1a622a63d6fa2"
2025-12-04T10:32:36.2786639Z 		},
2025-12-04T10:32:36.2786717Z 		{
2025-12-04T10:32:36.2786840Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2786992Z 			"size": 32,
2025-12-04T10:32:36.2787147Z 			"digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"
2025-12-04T10:32:36.2787316Z 		},
2025-12-04T10:32:36.2787391Z 		{
2025-12-04T10:32:36.2787511Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2787661Z 			"size": 106,
2025-12-04T10:32:36.2787815Z 			"digest": "sha256:ff0c473ca120ebdcaa2ba10b3274e82032edd5196019e76d4e7584553704ae81"
2025-12-04T10:32:36.2787986Z 		},
2025-12-04T10:32:36.2788065Z 		{
2025-12-04T10:32:36.2788187Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T10:32:36.2788339Z 			"size": 54145662,
2025-12-04T10:32:36.2788503Z 			"digest": "sha256:6bbc14b250efb3cdaad12c91573c6bb9129ad3e3432f0ed1a7eaebc9958d162f"
2025-12-04T10:32:36.2788675Z 		}
2025-12-04T10:32:36.2788750Z 	]
2025-12-04T10:32:36.2788828Z }
2025-12-04T10:32:36.2804646Z ##[group]Run set -eux
2025-12-04T10:32:36.2804763Z [36;1mset -eux[0m
2025-12-04T10:32:36.2804922Z [36;1m# It's ok if this steps fails, it would then be an anonymous user like what we used to have[0m
2025-12-04T10:32:36.2805345Z [36;1maws secretsmanager get-secret-value --secret-id docker_hub_readonly_token | jq --raw-output '.SecretString' | jq -r .docker_hub_readonly_token | docker login --username pytorchbot --password-stdin || true[0m
2025-12-04T10:32:36.2810001Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T10:32:36.2810150Z env:
2025-12-04T10:32:36.2810242Z   GIT_DEFAULT_BRANCH: main
2025-12-04T10:32:36.2810435Z   RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts
2025-12-04T10:32:36.2810610Z   RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results
2025-12-04T10:32:36.2810814Z   RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs
2025-12-04T10:32:36.2811324Z   GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host
2025-12-04T10:32:36.2811816Z   AWS_DEFAULT_REGION: us-east-1
2025-12-04T10:32:36.2811930Z   AWS_REGION: us-east-1
2025-12-04T10:32:36.2812150Z   AWS_ACCESS_KEY_ID: ***
2025-12-04T10:32:36.2812304Z   AWS_SECRET_ACCESS_KEY: ***
2025-12-04T10:32:36.2814296Z   AWS_SESSION_TOKEN: ***
2025-12-04T10:32:36.2814400Z ##[endgroup]
2025-12-04T10:32:36.2837524Z + aws secretsmanager get-secret-value --secret-id docker_hub_readonly_token
2025-12-04T10:32:36.2838080Z /home/runner/_work/_temp/2d96020d-af1a-42cb-b736-57bd79e6fe84.sh: line 3: aws: command not found
2025-12-04T10:32:36.2838524Z + jq --raw-output .SecretString
2025-12-04T10:32:36.2839888Z + jq -r .docker_hub_readonly_token
2025-12-04T10:32:36.2841223Z + docker login --username pytorchbot --password-stdin
2025-12-04T10:32:36.2942082Z Error: Cannot perform an interactive login from a non TTY device
2025-12-04T10:32:36.2949753Z + true
2025-12-04T10:32:36.3001617Z ##[group]Run pytorch/test-infra/.github/actions/pull-docker-image@main
2025-12-04T10:32:36.3001793Z with:
2025-12-04T10:32:36.3002061Z   docker-image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-rocm-n-py3-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T10:32:36.3002394Z   docker-registry: 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T10:32:36.3002548Z env:
2025-12-04T10:32:36.3002641Z   GIT_DEFAULT_BRANCH: main
2025-12-04T10:32:36.3002782Z   RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts
2025-12-04T10:32:36.3002961Z   RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results
2025-12-04T10:32:36.3003139Z   RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs
2025-12-04T10:32:36.3003657Z   GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host
2025-12-04T10:32:36.3004152Z   AWS_DEFAULT_REGION: us-east-1
2025-12-04T10:32:36.3004268Z   AWS_REGION: us-east-1
2025-12-04T10:32:36.3004407Z   AWS_ACCESS_KEY_ID: ***
2025-12-04T10:32:36.3004563Z   AWS_SECRET_ACCESS_KEY: ***
2025-12-04T10:32:36.3006578Z   AWS_SESSION_TOKEN: ***
2025-12-04T10:32:36.3006681Z ##[endgroup]
2025-12-04T10:32:36.3013309Z ##[group]Run set -x
2025-12-04T10:32:36.3013417Z [36;1mset -x[0m
2025-12-04T10:32:36.3013507Z [36;1mset +e[0m
2025-12-04T10:32:36.3013596Z [36;1m[0m
2025-12-04T10:32:36.3013681Z [36;1mlogin() {[0m
2025-12-04T10:32:36.3013865Z [36;1m  aws ecr get-login-password --region us-east-1 | docker login -u AWS --password-stdin "$1"[0m
2025-12-04T10:32:36.3014059Z [36;1m}[0m
2025-12-04T10:32:36.3014143Z [36;1m[0m
2025-12-04T10:32:36.3014227Z [36;1mretry () {[0m
2025-12-04T10:32:36.3014338Z [36;1m  $*  || (sleep 1 && $*) || (sleep 2 && $*)[0m
2025-12-04T10:32:36.3014461Z [36;1m}[0m
2025-12-04T10:32:36.3014546Z [36;1m[0m
2025-12-04T10:32:36.3014639Z [36;1mretry login "${DOCKER_REGISTRY}"[0m
2025-12-04T10:32:36.3014756Z [36;1m[0m
2025-12-04T10:32:36.3014939Z [36;1mIMAGE_SIZE=$(docker manifest inspect "${DOCKER_IMAGE}" | jq '[.layers[].size, .config.size] | add / 1024 / 1024')[0m
2025-12-04T10:32:36.3015184Z [36;1mecho "Compressed size of image in MB: ${IMAGE_SIZE}"[0m
2025-12-04T10:32:36.3015329Z [36;1m[0m
2025-12-04T10:32:36.3015411Z [36;1mset -e[0m
2025-12-04T10:32:36.3015544Z [36;1m# ignore output since only exit code is used for conditional[0m
2025-12-04T10:32:36.3015727Z [36;1m# only pull docker image if it's not available locally[0m
2025-12-04T10:32:36.3015984Z [36;1mif ! docker inspect --type=image "${DOCKER_IMAGE}" >/dev/null 2>/dev/null; then[0m
2025-12-04T10:32:36.3016170Z [36;1m  retry docker pull "${DOCKER_IMAGE}"[0m
2025-12-04T10:32:36.3016293Z [36;1mfi[0m
2025-12-04T10:32:36.3018863Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T10:32:36.3019010Z env:
2025-12-04T10:32:36.3019102Z   GIT_DEFAULT_BRANCH: main
2025-12-04T10:32:36.3019236Z   RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts
2025-12-04T10:32:36.3019408Z   RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results
2025-12-04T10:32:36.3019619Z   RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs
2025-12-04T10:32:36.3020116Z   GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host
2025-12-04T10:32:36.3020610Z   AWS_DEFAULT_REGION: us-east-1
2025-12-04T10:32:36.3020725Z   AWS_REGION: us-east-1
2025-12-04T10:32:36.3020863Z   AWS_ACCESS_KEY_ID: ***
2025-12-04T10:32:36.3021020Z   AWS_SECRET_ACCESS_KEY: ***
2025-12-04T10:32:36.3023006Z   AWS_SESSION_TOKEN: ***
2025-12-04T10:32:36.3023357Z   DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-rocm-n-py3-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T10:32:36.3023672Z   DOCKER_REGISTRY: 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T10:32:36.3023820Z ##[endgroup]
2025-12-04T10:32:36.3040899Z + set +e
2025-12-04T10:32:36.3041115Z + retry login 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T10:32:36.3041304Z + login 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T10:32:36.3044354Z + aws ecr get-login-password --region us-east-1
2025-12-04T10:32:36.3044573Z + docker login -u AWS --password-stdin 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T10:32:36.3044846Z /home/runner/_work/_temp/58254ef0-9b6c-4873-95c4-52ff485189c5.sh: line 5: aws: command not found
2025-12-04T10:32:36.3116951Z Error: Cannot perform an interactive login from a non TTY device
2025-12-04T10:32:36.3127107Z + sleep 1
2025-12-04T10:32:37.3138457Z + login 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T10:32:37.3141494Z + aws ecr get-login-password --region us-east-1
2025-12-04T10:32:37.3141880Z /home/runner/_work/_temp/58254ef0-9b6c-4873-95c4-52ff485189c5.sh: line 5: aws: command not found
2025-12-04T10:32:37.3142438Z + docker login -u AWS --password-stdin 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T10:32:37.3244145Z Error: Cannot perform an interactive login from a non TTY device
2025-12-04T10:32:37.3255785Z + sleep 2
2025-12-04T10:32:39.3267509Z + login 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T10:32:39.3270229Z + aws ecr get-login-password --region us-east-1
2025-12-04T10:32:39.3270790Z /home/runner/_work/_temp/58254ef0-9b6c-4873-95c4-52ff485189c5.sh: line 5: aws: command not found
2025-12-04T10:32:39.3272512Z + docker login -u AWS --password-stdin 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T10:32:39.3360956Z Error: Cannot perform an interactive login from a non TTY device
2025-12-04T10:32:39.3379737Z ++ docker manifest inspect 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-rocm-n-py3-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T10:32:39.3380404Z ++ jq '[.layers[].size, .config.size] | add / 1024 / 1024'
2025-12-04T10:32:40.7112541Z + IMAGE_SIZE=18171.470620155334
2025-12-04T10:32:40.7112761Z + echo 'Compressed size of image in MB: 18171.470620155334'
2025-12-04T10:32:40.7112931Z + set -e
2025-12-04T10:32:40.7113225Z + docker inspect --type=image 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-rocm-n-py3-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T10:32:40.7113553Z Compressed size of image in MB: 18171.470620155334
2025-12-04T10:32:40.7303199Z Prepare all required actions
2025-12-04T10:32:40.7317735Z ##[group]Run ./.github/actions/get-workflow-job-id
2025-12-04T10:32:40.7317876Z with:
2025-12-04T10:32:40.7318120Z   github-token: ***
2025-12-04T10:32:40.7318218Z env:
2025-12-04T10:32:40.7318308Z   GIT_DEFAULT_BRANCH: main
2025-12-04T10:32:40.7318442Z   RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts
2025-12-04T10:32:40.7318616Z   RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results
2025-12-04T10:32:40.7318799Z   RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs
2025-12-04T10:32:40.7319304Z   GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host
2025-12-04T10:32:40.7319856Z   AWS_DEFAULT_REGION: us-east-1
2025-12-04T10:32:40.7319983Z   AWS_REGION: us-east-1
2025-12-04T10:32:40.7320104Z   AWS_ACCESS_KEY_ID: ***
2025-12-04T10:32:40.7320270Z   AWS_SECRET_ACCESS_KEY: ***
2025-12-04T10:32:40.7322253Z   AWS_SESSION_TOKEN: ***
2025-12-04T10:32:40.7322355Z ##[endgroup]
2025-12-04T10:32:40.7328383Z ##[group]Run set -eux
2025-12-04T10:32:40.7328497Z [36;1mset -eux[0m
2025-12-04T10:32:40.7328671Z [36;1mpython3 .github/scripts/get_workflow_job_id.py "${GITHUB_RUN_ID}" "${RUNNER_NAME}"[0m
2025-12-04T10:32:40.7332639Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T10:32:40.7332785Z env:
2025-12-04T10:32:40.7332878Z   GIT_DEFAULT_BRANCH: main
2025-12-04T10:32:40.7333014Z   RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts
2025-12-04T10:32:40.7333188Z   RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results
2025-12-04T10:32:40.7333353Z   RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs
2025-12-04T10:32:40.7333861Z   GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host
2025-12-04T10:32:40.7334349Z   AWS_DEFAULT_REGION: us-east-1
2025-12-04T10:32:40.7334465Z   AWS_REGION: us-east-1
2025-12-04T10:32:40.7334599Z   AWS_ACCESS_KEY_ID: ***
2025-12-04T10:32:40.7334861Z   AWS_SECRET_ACCESS_KEY: ***
2025-12-04T10:32:40.7336839Z   AWS_SESSION_TOKEN: ***
2025-12-04T10:32:40.7336990Z   GITHUB_TOKEN: ***
2025-12-04T10:32:40.7337085Z ##[endgroup]
2025-12-04T10:32:40.7353270Z + python3 .github/scripts/get_workflow_job_id.py 19922849170 linux.rocm.gpu.gfx942.4.b-bphpw-runner-5l4hk
2025-12-04T10:32:41.9338791Z Setting output job-id=57116213187
2025-12-04T10:32:41.9339494Z Setting output job-name=linux-jammy-rocm-py3.10 / test (distributed, 2, 3, linux.rocm.gpu.gfx942.4.b, mem_leak_check, unstable)
2025-12-04T10:32:41.9438961Z Prepare all required actions
2025-12-04T10:32:41.9439200Z Getting action download info
2025-12-04T10:32:42.1630306Z Download action repository 'seemethere/download-artifact-s3@v4' (SHA:1da556a7aa0a088e3153970611f6c432d58e80e6)
2025-12-04T10:32:43.2837396Z Download action repository 'actions/download-artifact@v4' (SHA:d3f86a106a0bac45b974a628896c90dbdf5c8093)
2025-12-04T10:32:44.3142136Z ##[group]Run ./.github/actions/download-build-artifacts
2025-12-04T10:32:44.3142328Z with:
2025-12-04T10:32:44.3142450Z   name: linux-jammy-rocm-py3.10
2025-12-04T10:32:44.3142596Z   s3-bucket: gha-artifacts
2025-12-04T10:32:44.3142725Z env:
2025-12-04T10:32:44.3142844Z   GIT_DEFAULT_BRANCH: main
2025-12-04T10:32:44.3142996Z   RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts
2025-12-04T10:32:44.3143200Z   RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results
2025-12-04T10:32:44.3143385Z   RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs
2025-12-04T10:32:44.3143975Z   GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host
2025-12-04T10:32:44.3144639Z   AWS_DEFAULT_REGION: us-east-1
2025-12-04T10:32:44.3144775Z   AWS_REGION: us-east-1
2025-12-04T10:32:44.3144984Z   AWS_ACCESS_KEY_ID: ***
2025-12-04T10:32:44.3145154Z   AWS_SECRET_ACCESS_KEY: ***
2025-12-04T10:32:44.3147194Z   AWS_SESSION_TOKEN: ***
2025-12-04T10:32:44.3147315Z ##[endgroup]
2025-12-04T10:32:44.3173439Z ##[group]Run seemethere/download-artifact-s3@v4
2025-12-04T10:32:44.3173582Z with:
2025-12-04T10:32:44.3173686Z   name: linux-jammy-rocm-py3.10
2025-12-04T10:32:44.3173810Z   s3-bucket: gha-artifacts
2025-12-04T10:32:44.3173924Z   region: us-east-1
2025-12-04T10:32:44.3174024Z env:
2025-12-04T10:32:44.3174124Z   GIT_DEFAULT_BRANCH: main
2025-12-04T10:32:44.3174289Z   RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts
2025-12-04T10:32:44.3174470Z   RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results
2025-12-04T10:32:44.3174642Z   RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs
2025-12-04T10:32:44.3175143Z   GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host
2025-12-04T10:32:44.3175629Z   AWS_DEFAULT_REGION: us-east-1
2025-12-04T10:32:44.3175745Z   AWS_REGION: us-east-1
2025-12-04T10:32:44.3175880Z   AWS_ACCESS_KEY_ID: ***
2025-12-04T10:32:44.3176041Z   AWS_SECRET_ACCESS_KEY: ***
2025-12-04T10:32:44.3178018Z   AWS_SESSION_TOKEN: ***
2025-12-04T10:32:44.3178123Z ##[endgroup]
2025-12-04T10:32:44.5456005Z (node:17062) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023.
2025-12-04T10:32:44.5456220Z 
2025-12-04T10:32:44.5456527Z Please migrate your code to use AWS SDK for JavaScript (v3).
2025-12-04T10:32:44.5456774Z For more information, check the migration guide at https://a.co/7PzMCcy
2025-12-04T10:32:44.5457022Z (Use `node --trace-warnings ...` to show where the warning was created)
2025-12-04T10:32:44.8121158Z Found 1 objects with prefix pytorch/pytorch/19922849170/linux-jammy-rocm-py3.10/
2025-12-04T10:32:44.8121672Z Starting download (1/1): /home/runner/_work/pytorch/pytorch/artifacts.zip
2025-12-04T10:33:50.4971816Z Finished download (1/1): /home/runner/_work/pytorch/pytorch/artifacts.zip
2025-12-04T10:33:50.4974479Z Artifact download has finished successfully
2025-12-04T10:33:50.5264068Z ##[group]Run unzip -o artifacts.zip
2025-12-04T10:33:50.5264288Z [36;1munzip -o artifacts.zip[0m
2025-12-04T10:33:50.5268719Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T10:33:50.5268886Z env:
2025-12-04T10:33:50.5269173Z   GIT_DEFAULT_BRANCH: main
2025-12-04T10:33:50.5269328Z   RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts
2025-12-04T10:33:50.5269526Z   RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results
2025-12-04T10:33:50.5269791Z   RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs
2025-12-04T10:33:50.5270356Z   GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host
2025-12-04T10:33:50.5270916Z   AWS_DEFAULT_REGION: us-east-1
2025-12-04T10:33:50.5271046Z   AWS_REGION: us-east-1
2025-12-04T10:33:50.5271238Z   AWS_ACCESS_KEY_ID: ***
2025-12-04T10:33:50.5271407Z   AWS_SECRET_ACCESS_KEY: ***
2025-12-04T10:33:50.5273610Z   AWS_SESSION_TOKEN: ***
2025-12-04T10:33:50.5273726Z ##[endgroup]
2025-12-04T10:33:50.5312488Z Archive:  artifacts.zip
2025-12-04T10:33:50.5313092Z    creating: dist/
2025-12-04T10:33:50.5396336Z   inflating: dist/.ninja_log         
2025-12-04T10:33:53.4696228Z   inflating: dist/torch-2.10.0a0+gitffd9b0f-cp310-cp310-linux_x86_64.whl  
2025-12-04T10:33:53.4697309Z    creating: build/
2025-12-04T10:33:53.4697580Z    creating: build/custom_test_artifacts/
2025-12-04T10:33:53.4697969Z    creating: build/custom_test_artifacts/custom-op-build/
2025-12-04T10:33:53.4698425Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/
2025-12-04T10:33:53.4698967Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/pkgRedirects/
2025-12-04T10:33:53.4699773Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeConfigureLog.yaml  
2025-12-04T10:33:53.4700363Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/
2025-12-04T10:33:53.4700947Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeSystem.cmake  
2025-12-04T10:33:53.4701590Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdC/
2025-12-04T10:33:53.4702198Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdC/tmp/
2025-12-04T10:33:53.4702908Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdC/CMakeCCompilerId.c  
2025-12-04T10:33:53.4703612Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdC/a.out  
2025-12-04T10:33:53.4704273Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeCCompiler.cmake  
2025-12-04T10:33:53.4704915Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCXX/
2025-12-04T10:33:53.4705538Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCXX/tmp/
2025-12-04T10:33:53.4706272Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCXX/CMakeCXXCompilerId.cpp  
2025-12-04T10:33:53.4707006Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCXX/a.out  
2025-12-04T10:33:53.4707674Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeCXXCompiler.cmake  
2025-12-04T10:33:53.4708413Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_C.bin  
2025-12-04T10:33:53.4709207Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CXX.bin  
2025-12-04T10:33:53.4709729Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeScratch/
2025-12-04T10:33:53.4710089Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeTmp/
2025-12-04T10:33:53.4710467Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/cmake.check_cache  
2025-12-04T10:33:53.4710861Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/
2025-12-04T10:33:53.4711462Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/compiler_depend.ts  
2025-12-04T10:33:53.4711949Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/compiler_depend.make  
2025-12-04T10:33:53.4712418Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/depend.make  
2025-12-04T10:33:53.4712860Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/link.txt  
2025-12-04T10:33:53.4713315Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/cmake_clean.cmake  
2025-12-04T10:33:53.4713773Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/build.make  
2025-12-04T10:33:53.4714226Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/DependInfo.cmake  
2025-12-04T10:33:53.4714679Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/flags.make  
2025-12-04T10:33:53.4715134Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/progress.make  
2025-12-04T10:33:53.4720477Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/op.cpp.o.d  
2025-12-04T10:33:53.4827650Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/op.cpp.o  
2025-12-04T10:33:53.4828091Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/
2025-12-04T10:33:53.4828472Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/compiler_depend.ts  
2025-12-04T10:33:53.4828877Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/compiler_depend.make  
2025-12-04T10:33:53.4829259Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/depend.make  
2025-12-04T10:33:53.4829670Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/link.txt  
2025-12-04T10:33:53.4830049Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/cmake_clean.cmake  
2025-12-04T10:33:53.4830421Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/build.make  
2025-12-04T10:33:53.4830791Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/DependInfo.cmake  
2025-12-04T10:33:53.4831164Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/flags.make  
2025-12-04T10:33:53.4831531Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/progress.make  
2025-12-04T10:33:53.4841600Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/test_custom_ops.cpp.o.d  
2025-12-04T10:33:53.4885260Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/test_custom_ops.cpp.o  
2025-12-04T10:33:53.4885644Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeDirectoryInformation.cmake  
2025-12-04T10:33:53.4885977Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/TargetDirectories.txt  
2025-12-04T10:33:53.4886274Z  extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/progress.marks  
2025-12-04T10:33:53.4886556Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/Makefile2  
2025-12-04T10:33:53.4886830Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/Makefile.cmake  
2025-12-04T10:33:53.4887115Z   inflating: build/custom_test_artifacts/custom-op-build/hipblaslt_test_outer_vec.cc  
2025-12-04T10:33:53.4887393Z   inflating: build/custom_test_artifacts/custom-op-build/hipblaslt_test_vec_ext.cc  
2025-12-04T10:33:53.4888046Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeCache.txt  
2025-12-04T10:33:53.4888346Z   inflating: build/custom_test_artifacts/custom-op-build/Makefile  
2025-12-04T10:33:53.4888823Z   inflating: build/custom_test_artifacts/custom-op-build/cmake_install.cmake  
2025-12-04T10:33:53.4980449Z   inflating: build/custom_test_artifacts/custom-op-build/libcustom_ops.so  
2025-12-04T10:33:53.5009927Z   inflating: build/custom_test_artifacts/custom-op-build/test_custom_ops  
2025-12-04T10:33:53.5010244Z    creating: build/custom_test_artifacts/jit-hook-build/
2025-12-04T10:33:53.5010529Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/
2025-12-04T10:33:53.5010854Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/pkgRedirects/
2025-12-04T10:33:53.5012623Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeConfigureLog.yaml  
2025-12-04T10:33:53.5013012Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/
2025-12-04T10:33:53.5013365Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeSystem.cmake  
2025-12-04T10:33:53.5013749Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdC/
2025-12-04T10:33:53.5014115Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdC/tmp/
2025-12-04T10:33:53.5014554Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdC/CMakeCCompilerId.c  
2025-12-04T10:33:53.5014995Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdC/a.out  
2025-12-04T10:33:53.5015481Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeCCompiler.cmake  
2025-12-04T10:33:53.5015863Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCXX/
2025-12-04T10:33:53.5016242Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCXX/tmp/
2025-12-04T10:33:53.5016690Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCXX/CMakeCXXCompilerId.cpp  
2025-12-04T10:33:53.5017400Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCXX/a.out  
2025-12-04T10:33:53.5017733Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeCXXCompiler.cmake  
2025-12-04T10:33:53.5018828Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_C.bin  
2025-12-04T10:33:53.5019674Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CXX.bin  
2025-12-04T10:33:53.5019987Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeScratch/
2025-12-04T10:33:53.5020228Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeTmp/
2025-12-04T10:33:53.5020486Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/cmake.check_cache  
2025-12-04T10:33:53.5020751Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/
2025-12-04T10:33:53.5021048Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/compiler_depend.ts  
2025-12-04T10:33:53.5021390Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/compiler_depend.make  
2025-12-04T10:33:53.5021715Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/depend.make  
2025-12-04T10:33:53.5022014Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/link.txt  
2025-12-04T10:33:53.5022331Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/cmake_clean.cmake  
2025-12-04T10:33:53.5022652Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/build.make  
2025-12-04T10:33:53.5022964Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/DependInfo.cmake  
2025-12-04T10:33:53.5023279Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/flags.make  
2025-12-04T10:33:53.5023594Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/progress.make  
2025-12-04T10:33:53.5033716Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/test_jit_hooks.cpp.o.d  
2025-12-04T10:33:53.5067511Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/test_jit_hooks.cpp.o  
2025-12-04T10:33:53.5067830Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeDirectoryInformation.cmake  
2025-12-04T10:33:53.5068122Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/TargetDirectories.txt  
2025-12-04T10:33:53.5068378Z  extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/progress.marks  
2025-12-04T10:33:53.5068622Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/Makefile2  
2025-12-04T10:33:53.5069182Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/Makefile.cmake  
2025-12-04T10:33:53.5069440Z   inflating: build/custom_test_artifacts/jit-hook-build/hipblaslt_test_outer_vec.cc  
2025-12-04T10:33:53.5069702Z   inflating: build/custom_test_artifacts/jit-hook-build/hipblaslt_test_vec_ext.cc  
2025-12-04T10:33:53.5070545Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeCache.txt  
2025-12-04T10:33:53.5071000Z   inflating: build/custom_test_artifacts/jit-hook-build/Makefile  
2025-12-04T10:33:53.5071411Z   inflating: build/custom_test_artifacts/jit-hook-build/cmake_install.cmake  
2025-12-04T10:33:53.5091807Z   inflating: build/custom_test_artifacts/jit-hook-build/test_jit_hooks  
2025-12-04T10:33:53.5092121Z    creating: build/custom_test_artifacts/custom-backend-build/
2025-12-04T10:33:53.5092432Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/
2025-12-04T10:33:53.5092788Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/pkgRedirects/
2025-12-04T10:33:53.5094390Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeConfigureLog.yaml  
2025-12-04T10:33:53.5094795Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/
2025-12-04T10:33:53.5095188Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeSystem.cmake  
2025-12-04T10:33:53.5095622Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdC/
2025-12-04T10:33:53.5096048Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdC/tmp/
2025-12-04T10:33:53.5096541Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdC/CMakeCCompilerId.c  
2025-12-04T10:33:53.5097031Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdC/a.out  
2025-12-04T10:33:53.5097500Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeCCompiler.cmake  
2025-12-04T10:33:53.5097939Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCXX/
2025-12-04T10:33:53.5098357Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCXX/tmp/
2025-12-04T10:33:53.5098992Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCXX/CMakeCXXCompilerId.cpp  
2025-12-04T10:33:53.5099794Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCXX/a.out  
2025-12-04T10:33:53.5100246Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeCXXCompiler.cmake  
2025-12-04T10:33:53.5101258Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_C.bin  
2025-12-04T10:33:53.5101921Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CXX.bin  
2025-12-04T10:33:53.5102354Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeScratch/
2025-12-04T10:33:53.5102705Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeTmp/
2025-12-04T10:33:53.5103063Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/cmake.check_cache  
2025-12-04T10:33:53.5103461Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/
2025-12-04T10:33:53.5103963Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/compiler_depend.ts  
2025-12-04T10:33:53.5104443Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/compiler_depend.make  
2025-12-04T10:33:53.5104912Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/depend.make  
2025-12-04T10:33:53.5105336Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/link.txt  
2025-12-04T10:33:53.5105783Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/cmake_clean.cmake  
2025-12-04T10:33:53.5106243Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/build.make  
2025-12-04T10:33:53.5106696Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/DependInfo.cmake  
2025-12-04T10:33:53.5107140Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/flags.make  
2025-12-04T10:33:53.5107577Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/progress.make  
2025-12-04T10:33:53.5108111Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/custom_backend.cpp.o.d  
2025-12-04T10:33:53.5170645Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/custom_backend.cpp.o  
2025-12-04T10:33:53.5170984Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/
2025-12-04T10:33:53.5171345Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/compiler_depend.ts  
2025-12-04T10:33:53.5171730Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/compiler_depend.make  
2025-12-04T10:33:53.5172102Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/depend.make  
2025-12-04T10:33:53.5172443Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/link.txt  
2025-12-04T10:33:53.5172807Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/cmake_clean.cmake  
2025-12-04T10:33:53.5173162Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/build.make  
2025-12-04T10:33:53.5173519Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/DependInfo.cmake  
2025-12-04T10:33:53.5173869Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/flags.make  
2025-12-04T10:33:53.5174216Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/progress.make  
2025-12-04T10:33:53.5184569Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/test_custom_backend.cpp.o.d  
2025-12-04T10:33:53.5213947Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/test_custom_backend.cpp.o  
2025-12-04T10:33:53.5214311Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeDirectoryInformation.cmake  
2025-12-04T10:33:53.5214657Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/TargetDirectories.txt  
2025-12-04T10:33:53.5214942Z  extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/progress.marks  
2025-12-04T10:33:53.5215271Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/Makefile2  
2025-12-04T10:33:53.5216033Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/Makefile.cmake  
2025-12-04T10:33:53.5216342Z   inflating: build/custom_test_artifacts/custom-backend-build/hipblaslt_test_outer_vec.cc  
2025-12-04T10:33:53.5216626Z   inflating: build/custom_test_artifacts/custom-backend-build/hipblaslt_test_vec_ext.cc  
2025-12-04T10:33:53.5217333Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeCache.txt  
2025-12-04T10:33:53.5217726Z   inflating: build/custom_test_artifacts/custom-backend-build/Makefile  
2025-12-04T10:33:53.5217965Z   inflating: build/custom_test_artifacts/custom-backend-build/cmake_install.cmake  
2025-12-04T10:33:53.5272014Z   inflating: build/custom_test_artifacts/custom-backend-build/libcustom_backend.so  
2025-12-04T10:33:53.5292719Z   inflating: build/custom_test_artifacts/custom-backend-build/test_custom_backend  
2025-12-04T10:33:53.5292916Z    creating: build/lib/
2025-12-04T10:33:53.5338338Z   inflating: build/lib/libprotobuf-lite.a  
2025-12-04T10:33:53.5582014Z   inflating: build/lib/libprotobuf.a  
2025-12-04T10:33:53.5855289Z   inflating: build/lib/libprotoc.a   
2025-12-04T10:33:53.5860780Z   inflating: build/lib/libpthreadpool.a  
2025-12-04T10:33:53.5864879Z   inflating: build/lib/libcpuinfo.a  
2025-12-04T10:33:53.5868779Z   inflating: build/lib/libcpuinfo_internals.a  
2025-12-04T10:33:53.5869274Z   inflating: build/lib/libclog.a     
2025-12-04T10:33:53.5879508Z   inflating: build/lib/libpytorch_qnnpack.a  
2025-12-04T10:33:53.5880622Z   inflating: build/lib/libnnpack_reference_layers.a  
2025-12-04T10:33:53.5890206Z   inflating: build/lib/libnnpack.a   
2025-12-04T10:33:53.5991335Z   inflating: build/lib/libmicrokernels-prod.a  
2025-12-04T10:33:53.6459968Z   inflating: build/lib/libmicrokernels-all.a  
2025-12-04T10:33:53.6497686Z   inflating: build/lib/libgtest.a    
2025-12-04T10:33:53.6506959Z   inflating: build/lib/libgmock.a    
2025-12-04T10:33:53.6507186Z   inflating: build/lib/libgtest_main.a  
2025-12-04T10:33:53.6507390Z   inflating: build/lib/libgmock_main.a  
2025-12-04T10:33:53.6556862Z   inflating: build/lib/libXNNPACK.a  
2025-12-04T10:33:53.6598105Z   inflating: build/lib/libbenchmark.a  
2025-12-04T10:33:53.6598433Z   inflating: build/lib/libbenchmark_main.a  
2025-12-04T10:33:53.6598667Z   inflating: build/lib/libjitprofiling.a  
2025-12-04T10:33:53.6603102Z   inflating: build/lib/libittnotify.a  
2025-12-04T10:33:53.6639169Z   inflating: build/lib/libasmjit.a   
2025-12-04T10:33:53.7262239Z   inflating: build/lib/libfbgemm.a   
2025-12-04T10:33:53.7278839Z   inflating: build/lib/libtensorpipe_uv.a  
2025-12-04T10:33:53.7575491Z   inflating: build/lib/libtensorpipe.a  
2025-12-04T10:33:53.7641757Z   inflating: build/lib/libgloo.a     
2025-12-04T10:33:53.7667329Z   inflating: build/lib/libonnx_proto.a  
2025-12-04T10:33:53.7889124Z   inflating: build/lib/libgloo_hip.a  
2025-12-04T10:33:53.8282647Z   inflating: build/lib/libonnx.a     
2025-12-04T10:33:54.3817113Z   inflating: build/lib/libdnnl.a     
2025-12-04T10:33:54.3827048Z   inflating: build/lib/libfmt.a      
2025-12-04T10:33:54.3996672Z   inflating: build/lib/libkineto.a   
2025-12-04T10:33:54.4060994Z   inflating: build/lib/libc10.so     
2025-12-04T10:33:54.4061374Z   inflating: build/lib/libtorch_global_deps.so  
2025-12-04T10:33:54.4062529Z   inflating: build/lib/libcaffe2_nvrtc.so  
2025-12-04T10:33:54.4087615Z   inflating: build/lib/libc10_hip.so  
2025-12-04T10:33:54.4360610Z   inflating: build/lib/libfbgemm_genai.a  
2025-12-04T10:33:56.1291342Z   inflating: build/lib/libtorch_cpu.so  
2025-12-04T10:33:56.1292983Z   inflating: build/lib/libshm.so     
2025-12-04T10:33:56.9578713Z   inflating: build/lib/libtorch_hip.so  
2025-12-04T10:33:56.9579250Z   inflating: build/lib/libtorch.so   
2025-12-04T10:33:56.9589542Z   inflating: build/lib/libjitbackend_test.so  
2025-12-04T10:33:56.9602747Z   inflating: build/lib/libbackend_with_compiler.so  
2025-12-04T10:33:56.9642322Z   inflating: build/lib/libtorchbind_test.so  
2025-12-04T10:33:56.9656992Z   inflating: build/lib/libaoti_custom_ops.so  
2025-12-04T10:33:57.0947744Z   inflating: build/lib/libtorch_python.so  
2025-12-04T10:33:57.0967707Z   inflating: build/lib/libnnapi_backend.so  
2025-12-04T10:33:57.0968031Z    creating: build/bin/
2025-12-04T10:33:57.0968284Z    creating: build/bin/CMakeFiles/
2025-12-04T10:33:57.0969105Z   inflating: build/bin/cmake_install.cmake  
2025-12-04T10:33:57.0969401Z   inflating: build/bin/CTestTestfile.cmake  
2025-12-04T10:33:57.1221129Z   inflating: build/bin/protoc-3.13.0.0  
2025-12-04T10:33:57.1473255Z   inflating: build/bin/protoc        
2025-12-04T10:33:57.1505998Z   inflating: build/bin/c10_AllocatorConfig_test  
2025-12-04T10:33:57.1536809Z   inflating: build/bin/c10_CompileTimeFunctionPointer_test  
2025-12-04T10:33:57.1568292Z   inflating: build/bin/c10_DeviceGuard_test  
2025-12-04T10:33:57.1599979Z   inflating: build/bin/c10_Device_test  
2025-12-04T10:33:57.1636037Z   inflating: build/bin/c10_DispatchKeySet_test  
2025-12-04T10:33:57.1668533Z   inflating: build/bin/c10_Scalar_test  
2025-12-04T10:33:57.1698649Z   inflating: build/bin/c10_StreamGuard_test  
2025-12-04T10:33:57.1732977Z   inflating: build/bin/c10_SymInt_test  
2025-12-04T10:33:57.1767352Z   inflating: build/bin/c10_SizesAndStrides_test  
2025-12-04T10:33:57.1799817Z   inflating: build/bin/c10_Bitset_test  
2025-12-04T10:33:57.1841759Z   inflating: build/bin/c10_cow_test  
2025-12-04T10:33:57.1874845Z   inflating: build/bin/c10_InlineDeviceGuard_test  
2025-12-04T10:33:57.1908915Z   inflating: build/bin/c10_InlineStreamGuard_test  
2025-12-04T10:33:57.1939252Z   inflating: build/bin/c10_ArrayRef_test  
2025-12-04T10:33:57.1969396Z   inflating: build/bin/c10_ConstexprCrc_test  
2025-12-04T10:33:57.1999963Z   inflating: build/bin/c10_DeadlockDetection_test  
2025-12-04T10:33:57.2032867Z   inflating: build/bin/c10_IntrusiveList_test  
2025-12-04T10:33:57.2063985Z   inflating: build/bin/c10_Half_test  
2025-12-04T10:33:57.2099299Z   inflating: build/bin/c10_Enumerate_test  
2025-12-04T10:33:57.2133837Z   inflating: build/bin/c10_LeftRight_test  
2025-12-04T10:33:57.2165940Z   inflating: build/bin/c10_NetworkFlow_test  
2025-12-04T10:33:57.2196325Z   inflating: build/bin/c10_Semaphore_test  
2025-12-04T10:33:57.2227122Z   inflating: build/bin/c10_Synchronized_test  
2025-12-04T10:33:57.2258966Z   inflating: build/bin/c10_TypeIndex_test  
2025-12-04T10:33:57.2292713Z   inflating: build/bin/c10_ThreadLocal_test  
2025-12-04T10:33:57.2324366Z   inflating: build/bin/c10_accumulate_test  
2025-12-04T10:33:57.2358464Z   inflating: build/bin/c10_bfloat16_test  
2025-12-04T10:33:57.2388896Z   inflating: build/bin/c10_error_test  
2025-12-04T10:33:57.2419897Z   inflating: build/bin/c10_bit_cast_test  
2025-12-04T10:33:57.2453545Z   inflating: build/bin/c10_complex_test  
2025-12-04T10:33:57.2485663Z   inflating: build/bin/c10_exception_test  
2025-12-04T10:33:57.2520202Z   inflating: build/bin/c10_complex_math_test  
2025-12-04T10:33:57.2551193Z   inflating: build/bin/c10_flags_test  
2025-12-04T10:33:57.2582464Z   inflating: build/bin/c10_irange_test  
2025-12-04T10:33:57.2613432Z   inflating: build/bin/c10_generic_math_test  
2025-12-04T10:33:57.2702893Z   inflating: build/bin/c10_intrusive_ptr_test  
2025-12-04T10:33:57.2737565Z   inflating: build/bin/c10_logging_test  
2025-12-04T10:33:57.2768225Z   inflating: build/bin/c10_nofatal_test  
2025-12-04T10:33:57.2800858Z   inflating: build/bin/c10_lazy_test  
2025-12-04T10:33:57.2838489Z   inflating: build/bin/c10_ordered_preserving_dict_test  
2025-12-04T10:33:57.2871038Z   inflating: build/bin/c10_registry_test  
2025-12-04T10:33:57.2902581Z   inflating: build/bin/c10_ssize_test  
2025-12-04T10:33:57.2947200Z   inflating: build/bin/c10_optional_test  
2025-12-04T10:33:57.3034639Z   inflating: build/bin/c10_small_vector_test  
2025-12-04T10:33:57.3069013Z   inflating: build/bin/c10_string_util_test  
2025-12-04T10:33:57.3099674Z   inflating: build/bin/c10_tempfile_test  
2025-12-04T10:33:57.3129763Z   inflating: build/bin/c10_string_view_test  
2025-12-04T10:33:57.3156649Z   inflating: build/bin/c10_intrusive_ptr_benchmark  
2025-12-04T10:33:57.3190637Z   inflating: build/bin/c10_typeid_test  
2025-12-04T10:33:57.3221086Z   inflating: build/bin/c10_hip_HIPAssertionsTest_1_var_test  
2025-12-04T10:33:57.3251091Z   inflating: build/bin/c10_hip_HIPAssertionsTest_catches_stream  
2025-12-04T10:33:57.3281748Z   inflating: build/bin/c10_hip_HIPAssertionsTest_catches_thread_and_block_and_device  
2025-12-04T10:33:57.3311165Z   inflating: build/bin/c10_hip_HIPAssertionsTest_from_2_processes  
2025-12-04T10:33:57.3341213Z   inflating: build/bin/c10_hip_HIPAssertionsTest_multiple_writes_from_blocks_and_threads  
2025-12-04T10:33:57.3371310Z   inflating: build/bin/c10_hip_HIPAssertionsTest_multiple_writes_from_multiple_blocks  
2025-12-04T10:33:57.3401400Z   inflating: build/bin/c10_hip_HIPAssertionsTest_multiple_writes_from_same_block  
2025-12-04T10:33:57.3431639Z   inflating: build/bin/c10_hip_HIPTest  
2025-12-04T10:33:57.3758504Z   inflating: build/bin/vec_test_all_types_DEFAULT  
2025-12-04T10:33:57.4092818Z   inflating: build/bin/vec_test_all_types_AVX512  
2025-12-04T10:33:57.4433885Z   inflating: build/bin/vec_test_all_types_AVX2  
2025-12-04T10:33:57.4491257Z   inflating: build/bin/test_aoti_abi_check  
2025-12-04T10:33:57.4521961Z   inflating: build/bin/test_vec_half_DEFAULT  
2025-12-04T10:33:57.4552359Z   inflating: build/bin/test_vec_half_AVX2  
2025-12-04T10:33:57.4582939Z   inflating: build/bin/test_vec_half_AVX512  
2025-12-04T10:33:57.4615038Z   inflating: build/bin/BackoffTest   
2025-12-04T10:33:57.4647418Z   inflating: build/bin/FileStoreTest  
2025-12-04T10:33:57.4682026Z   inflating: build/bin/TCPStoreTest  
2025-12-04T10:33:57.4714760Z   inflating: build/bin/HashStoreTest  
2025-12-04T10:33:57.4755081Z   inflating: build/bin/ProcessGroupGlooTest  
2025-12-04T10:33:57.4756627Z   inflating: build/bin/example_allreduce  
2025-12-04T10:33:57.4758663Z   inflating: build/bin/torch_shm_manager  
2025-12-04T10:33:57.4791674Z   inflating: build/bin/static_runtime_bench  
2025-12-04T10:33:57.4935148Z   inflating: build/bin/static_runtime_test  
2025-12-04T10:33:57.4978641Z   inflating: build/bin/Dict_test     
2025-12-04T10:33:57.5010888Z   inflating: build/bin/Dimname_test  
2025-12-04T10:33:57.5049863Z   inflating: build/bin/MaybeOwned_test  
2025-12-04T10:33:57.5084548Z   inflating: build/bin/NamedTensor_test  
2025-12-04T10:33:57.5120338Z   inflating: build/bin/apply_utils_test  
2025-12-04T10:33:57.5156106Z   inflating: build/bin/atest         
2025-12-04T10:33:57.5194654Z   inflating: build/bin/basic         
2025-12-04T10:33:57.5228107Z   inflating: build/bin/broadcast_test  
2025-12-04T10:33:57.5259246Z   inflating: build/bin/cpu_allocator_test  
2025-12-04T10:33:57.5294604Z   inflating: build/bin/cpu_generator_test  
2025-12-04T10:33:57.5326680Z   inflating: build/bin/cpu_profiling_allocator_test  
2025-12-04T10:33:57.5381648Z   inflating: build/bin/cpu_rng_test  
2025-12-04T10:33:57.5413358Z   inflating: build/bin/dlconvertor_test  
2025-12-04T10:33:57.5448268Z   inflating: build/bin/extension_backend_test  
2025-12-04T10:33:57.5482191Z   inflating: build/bin/half_test     
2025-12-04T10:33:57.5539509Z   inflating: build/bin/ivalue_test   
2025-12-04T10:33:57.5570033Z   inflating: build/bin/lazy_tensor_test  
2025-12-04T10:33:57.5602241Z   inflating: build/bin/math_kernel_test  
2025-12-04T10:33:57.5634420Z   inflating: build/bin/memory_format_test  
2025-12-04T10:33:57.5667148Z   inflating: build/bin/memory_overlapping_test  
2025-12-04T10:33:57.5699536Z   inflating: build/bin/mobile_memory_cleanup  
2025-12-04T10:33:57.5733511Z   inflating: build/bin/native_test   
2025-12-04T10:33:57.5764850Z   inflating: build/bin/operator_name_test  
2025-12-04T10:33:57.5795842Z   inflating: build/bin/operators_test  
2025-12-04T10:33:57.5827621Z   inflating: build/bin/packedtensoraccessor_test  
2025-12-04T10:33:57.5868153Z   inflating: build/bin/pow_test      
2025-12-04T10:33:57.5902685Z   inflating: build/bin/quantized_test  
2025-12-04T10:33:57.5933275Z   inflating: build/bin/reduce_ops_test  
2025-12-04T10:33:57.5964451Z   inflating: build/bin/reportMemoryUsage_test  
2025-12-04T10:33:57.5998352Z   inflating: build/bin/scalar_tensor_test  
2025-12-04T10:33:57.6033231Z   inflating: build/bin/scalar_test   
2025-12-04T10:33:57.6064755Z   inflating: build/bin/StorageUtils_test  
2025-12-04T10:33:57.6096603Z   inflating: build/bin/stride_properties_test  
2025-12-04T10:33:57.6143718Z   inflating: build/bin/tensor_iterator_test  
2025-12-04T10:33:57.6176617Z   inflating: build/bin/test_parallel  
2025-12-04T10:33:57.6208245Z   inflating: build/bin/thread_init_test  
2025-12-04T10:33:57.6241640Z   inflating: build/bin/type_ptr_test  
2025-12-04T10:33:57.6277531Z   inflating: build/bin/type_test     
2025-12-04T10:33:57.6309450Z   inflating: build/bin/undefined_tensor_test  
2025-12-04T10:33:57.6339860Z   inflating: build/bin/verify_api_visibility  
2025-12-04T10:33:57.6382356Z   inflating: build/bin/legacy_vmap_test  
2025-12-04T10:33:57.6413805Z   inflating: build/bin/weakref_test  
2025-12-04T10:33:57.6445293Z   inflating: build/bin/wrapdim_test  
2025-12-04T10:33:57.6506816Z   inflating: build/bin/List_test     
2025-12-04T10:33:57.6538175Z   inflating: build/bin/xla_tensor_test  
2025-12-04T10:33:57.6574036Z   inflating: build/bin/IListRef_test  
2025-12-04T10:33:57.6643482Z   inflating: build/bin/kernel_function_legacy_test  
2025-12-04T10:33:57.6683177Z   inflating: build/bin/KernelFunction_test  
2025-12-04T10:33:57.6739833Z   inflating: build/bin/kernel_function_test  
2025-12-04T10:33:57.6813173Z   inflating: build/bin/kernel_lambda_legacy_test  
2025-12-04T10:33:57.6873091Z   inflating: build/bin/kernel_lambda_test  
2025-12-04T10:33:57.6909397Z   inflating: build/bin/kernel_stackbased_test  
2025-12-04T10:33:57.6965592Z   inflating: build/bin/make_boxed_from_unboxed_functor_test  
2025-12-04T10:33:57.6996874Z   inflating: build/bin/CppSignature_test  
2025-12-04T10:33:57.7027052Z   inflating: build/bin/op_allowlist_test  
2025-12-04T10:33:57.7204215Z   inflating: build/bin/op_registration_test  
2025-12-04T10:33:57.7233796Z   inflating: build/bin/hip_complex_math_test  
2025-12-04T10:33:57.7267271Z   inflating: build/bin/backend_fallback_test  
2025-12-04T10:33:57.7297674Z   inflating: build/bin/hip_complex_test  
2025-12-04T10:33:57.7337919Z   inflating: build/bin/inline_container_test  
2025-12-04T10:33:57.7370168Z   inflating: build/bin/hip_apply_test  
2025-12-04T10:33:57.7400402Z   inflating: build/bin/hip_distributions_test  
2025-12-04T10:33:57.7430458Z   inflating: build/bin/hip_generator_test  
2025-12-04T10:33:57.7460438Z   inflating: build/bin/hip_half_test  
2025-12-04T10:33:57.7490521Z   inflating: build/bin/hip_integer_divider_test  
2025-12-04T10:33:57.7520576Z   inflating: build/bin/hip_optional_test  
2025-12-04T10:33:57.7550638Z   inflating: build/bin/hip_packedtensoraccessor_test  
2025-12-04T10:33:57.7580711Z   inflating: build/bin/hip_vectorized_test  
2025-12-04T10:33:57.7612325Z   inflating: build/bin/hip_dlconvertor_test  
2025-12-04T10:33:57.8231200Z   inflating: build/bin/test_jit      
2025-12-04T10:33:57.8429443Z   inflating: build/bin/test_lazy     
2025-12-04T10:33:57.8463803Z   inflating: build/bin/test_dist_autograd  
2025-12-04T10:33:57.8504968Z   inflating: build/bin/test_cpp_rpc  
2025-12-04T10:33:57.8505982Z   inflating: build/bin/parallel_benchmark  
2025-12-04T10:33:57.9162306Z   inflating: build/bin/test_api      
2025-12-04T10:33:57.9162717Z    creating: .additional_ci_files/
2025-12-04T10:33:57.9198259Z   inflating: .additional_ci_files/test-times.json  
2025-12-04T10:33:57.9329827Z   inflating: .additional_ci_files/test-class-times.json  
2025-12-04T10:33:57.9355998Z ##[group]Run rm artifacts.zip
2025-12-04T10:33:57.9356195Z [36;1mrm artifacts.zip[0m
2025-12-04T10:33:57.9361187Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T10:33:57.9361401Z env:
2025-12-04T10:33:57.9361538Z   GIT_DEFAULT_BRANCH: main
2025-12-04T10:33:57.9361721Z   RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts
2025-12-04T10:33:57.9361958Z   RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results
2025-12-04T10:33:57.9362185Z   RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs
2025-12-04T10:33:57.9363052Z   GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host
2025-12-04T10:33:57.9363706Z   AWS_DEFAULT_REGION: us-east-1
2025-12-04T10:33:57.9363927Z   AWS_REGION: us-east-1
2025-12-04T10:33:57.9364123Z   AWS_ACCESS_KEY_ID: ***
2025-12-04T10:33:57.9364332Z   AWS_SECRET_ACCESS_KEY: ***
2025-12-04T10:33:57.9366437Z   AWS_SESSION_TOKEN: ***
2025-12-04T10:33:57.9366561Z ##[endgroup]
2025-12-04T10:33:58.0301591Z ##[group]Run df -H
2025-12-04T10:33:58.0301781Z [36;1mdf -H[0m
2025-12-04T10:33:58.0307361Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T10:33:58.0307562Z env:
2025-12-04T10:33:58.0307695Z   GIT_DEFAULT_BRANCH: main
2025-12-04T10:33:58.0307881Z   RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts
2025-12-04T10:33:58.0308117Z   RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results
2025-12-04T10:33:58.0308354Z   RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs
2025-12-04T10:33:58.0309018Z   GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host
2025-12-04T10:33:58.0309887Z   AWS_DEFAULT_REGION: us-east-1
2025-12-04T10:33:58.0310011Z   AWS_REGION: us-east-1
2025-12-04T10:33:58.0310224Z   AWS_ACCESS_KEY_ID: ***
2025-12-04T10:33:58.0310388Z   AWS_SECRET_ACCESS_KEY: ***
2025-12-04T10:33:58.0312546Z   AWS_SESSION_TOKEN: ***
2025-12-04T10:33:58.0312660Z ##[endgroup]
2025-12-04T10:33:58.0668503Z Filesystem      Size  Used Avail Use% Mounted on
2025-12-04T10:33:58.0669004Z overlay          16T  460G   15T   3% /
2025-12-04T10:33:58.0669366Z tmpfs            68M     0   68M   0% /dev
2025-12-04T10:33:58.0669870Z /dev/md0         16T  460G   15T   3% /run
2025-12-04T10:33:58.0670228Z shm              68M   17k   68M   1% /dev/shm
2025-12-04T10:33:58.0670822Z amdprj2-k8s_2   5.5T  120G  5.4T   3% /home/runner/pytorch-data
2025-12-04T10:33:58.0671480Z tmpfs           3.3T   13k  3.3T   1% /run/secrets/kubernetes.io/serviceaccount
2025-12-04T10:33:58.0671969Z tmpfs           1.7T     0  1.7T   0% /proc/acpi
2025-12-04T10:33:58.0672343Z tmpfs           1.7T     0  1.7T   0% /proc/scsi
2025-12-04T10:33:58.0672727Z tmpfs           1.7T     0  1.7T   0% /sys/firmware
2025-12-04T10:33:58.0673151Z tmpfs           1.7T     0  1.7T   0% /sys/devices/virtual/powercap
2025-12-04T10:33:58.0701812Z Prepare all required actions
2025-12-04T10:33:58.0702034Z Getting action download info
2025-12-04T10:33:58.4392831Z ##[group]Run ./.github/actions/download-td-artifacts
2025-12-04T10:33:58.4392989Z with:
2025-12-04T10:33:58.4393079Z env:
2025-12-04T10:33:58.4393181Z   GIT_DEFAULT_BRANCH: main
2025-12-04T10:33:58.4393326Z   RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts
2025-12-04T10:33:58.4393512Z   RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results
2025-12-04T10:33:58.4393677Z   RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs
2025-12-04T10:33:58.4394194Z   GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host
2025-12-04T10:33:58.4394702Z   AWS_DEFAULT_REGION: us-east-1
2025-12-04T10:33:58.4394849Z   AWS_REGION: us-east-1
2025-12-04T10:33:58.4395045Z   AWS_ACCESS_KEY_ID: ***
2025-12-04T10:33:58.4395207Z   AWS_SECRET_ACCESS_KEY: ***
2025-12-04T10:33:58.4397241Z   AWS_SESSION_TOKEN: ***
2025-12-04T10:33:58.4397350Z ##[endgroup]
2025-12-04T10:33:58.4410826Z ##[group]Run seemethere/download-artifact-s3@v4
2025-12-04T10:33:58.4410972Z with:
2025-12-04T10:33:58.4411073Z   name: td_results
2025-12-04T10:33:58.4411182Z   s3-bucket: gha-artifacts
2025-12-04T10:33:58.4411300Z   region: us-east-1
2025-12-04T10:33:58.4411404Z env:
2025-12-04T10:33:58.4411502Z   GIT_DEFAULT_BRANCH: main
2025-12-04T10:33:58.4411646Z   RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts
2025-12-04T10:33:58.4411836Z   RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results
2025-12-04T10:33:58.4412014Z   RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs
2025-12-04T10:33:58.4412527Z   GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host
2025-12-04T10:33:58.4413021Z   AWS_DEFAULT_REGION: us-east-1
2025-12-04T10:33:58.4413147Z   AWS_REGION: us-east-1
2025-12-04T10:33:58.4413288Z   AWS_ACCESS_KEY_ID: ***
2025-12-04T10:33:58.4413449Z   AWS_SECRET_ACCESS_KEY: ***
2025-12-04T10:33:58.4415431Z   AWS_SESSION_TOKEN: ***
2025-12-04T10:33:58.4415545Z ##[endgroup]
2025-12-04T10:33:58.6742369Z (node:17098) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023.
2025-12-04T10:33:58.6742658Z 
2025-12-04T10:33:58.6743008Z Please migrate your code to use AWS SDK for JavaScript (v3).
2025-12-04T10:33:58.6743356Z For more information, check the migration guide at https://a.co/7PzMCcy
2025-12-04T10:33:58.6743710Z (Use `node --trace-warnings ...` to show where the warning was created)
2025-12-04T10:33:58.9650074Z Found 1 objects with prefix pytorch/pytorch/19922849170/td_results/
2025-12-04T10:33:58.9650484Z Starting download (1/1): /home/runner/_work/pytorch/pytorch/td_results.json
2025-12-04T10:33:59.4216374Z Finished download (1/1): /home/runner/_work/pytorch/pytorch/td_results.json
2025-12-04T10:33:59.4220168Z Artifact download has finished successfully
2025-12-04T10:33:59.4411089Z ##[group]Run mkdir -p .additional_ci_files
2025-12-04T10:33:59.4411273Z [36;1mmkdir -p .additional_ci_files[0m
2025-12-04T10:33:59.4411453Z [36;1mmv td_results.json .additional_ci_files/td_results.json || true[0m
2025-12-04T10:33:59.4416014Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T10:33:59.4416172Z env:
2025-12-04T10:33:59.4416298Z   GIT_DEFAULT_BRANCH: main
2025-12-04T10:33:59.4416445Z   RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts
2025-12-04T10:33:59.4416628Z   RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results
2025-12-04T10:33:59.4416801Z   RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs
2025-12-04T10:33:59.4417487Z   GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host
2025-12-04T10:33:59.4417983Z   AWS_DEFAULT_REGION: us-east-1
2025-12-04T10:33:59.4418107Z   AWS_REGION: us-east-1
2025-12-04T10:33:59.4418371Z   AWS_ACCESS_KEY_ID: ***
2025-12-04T10:33:59.4418543Z   AWS_SECRET_ACCESS_KEY: ***
2025-12-04T10:33:59.4420593Z   AWS_SESSION_TOKEN: ***
2025-12-04T10:33:59.4420706Z ##[endgroup]
2025-12-04T10:33:59.4473369Z ##[group]Run .github/scripts/parse_ref.py
2025-12-04T10:33:59.4473527Z [36;1m.github/scripts/parse_ref.py[0m
2025-12-04T10:33:59.4476028Z shell: /usr/bin/bash -e {0}
2025-12-04T10:33:59.4476143Z env:
2025-12-04T10:33:59.4476241Z   GIT_DEFAULT_BRANCH: main
2025-12-04T10:33:59.4476381Z   RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts
2025-12-04T10:33:59.4476563Z   RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results
2025-12-04T10:33:59.4476733Z   RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs
2025-12-04T10:33:59.4477248Z   GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host
2025-12-04T10:33:59.4477744Z   AWS_DEFAULT_REGION: us-east-1
2025-12-04T10:33:59.4477864Z   AWS_REGION: us-east-1
2025-12-04T10:33:59.4478016Z   AWS_ACCESS_KEY_ID: ***
2025-12-04T10:33:59.4478178Z   AWS_SECRET_ACCESS_KEY: ***
2025-12-04T10:33:59.4480285Z   AWS_SESSION_TOKEN: ***
2025-12-04T10:33:59.4480394Z ##[endgroup]
2025-12-04T10:33:59.4575949Z Setting output branch=main
2025-12-04T10:33:59.4644951Z Prepare all required actions
2025-12-04T10:33:59.4645210Z Getting action download info
2025-12-04T10:33:59.6895302Z ##[group]Run ./.github/actions/filter-test-configs
2025-12-04T10:33:59.6895451Z with:
2025-12-04T10:33:59.6895730Z   github-token: ***
2025-12-04T10:33:59.6898708Z   test-matrix: {"include": [{"config": "default", "shard": 1, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "default", "shard": 1, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "default", "shard": 2, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "default", "shard": 2, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "default", "shard": 3, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "default", "shard": 3, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "default", "shard": 4, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "default", "shard": 4, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "default", "shard": 5, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "default", "shard": 5, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "default", "shard": 6, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "default", "shard": 6, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}]}
2025-12-04T10:33:59.6902117Z   job-name: linux-jammy-rocm-py3.10 / test (distributed, 2, 3, linux.rocm.gpu.gfx942.4.b, mem_leak_check, unstable)
2025-12-04T10:33:59.6902334Z env:
2025-12-04T10:33:59.6902427Z   GIT_DEFAULT_BRANCH: main
2025-12-04T10:33:59.6902565Z   RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts
2025-12-04T10:33:59.6902741Z   RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results
2025-12-04T10:33:59.6902906Z   RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs
2025-12-04T10:33:59.6903417Z   GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host
2025-12-04T10:33:59.6903909Z   AWS_DEFAULT_REGION: us-east-1
2025-12-04T10:33:59.6904166Z   AWS_REGION: us-east-1
2025-12-04T10:33:59.6904291Z   AWS_ACCESS_KEY_ID: ***
2025-12-04T10:33:59.6904443Z   AWS_SECRET_ACCESS_KEY: ***
2025-12-04T10:33:59.6906426Z   AWS_SESSION_TOKEN: ***
2025-12-04T10:33:59.6906528Z ##[endgroup]
2025-12-04T10:33:59.6935174Z ##[group]Run nick-fields/retry@v3.0.0
2025-12-04T10:33:59.6935297Z with:
2025-12-04T10:33:59.6935380Z   shell: bash
2025-12-04T10:33:59.6935473Z   timeout_minutes: 10
2025-12-04T10:33:59.6935571Z   max_attempts: 5
2025-12-04T10:33:59.6935667Z   retry_wait_seconds: 30
2025-12-04T10:33:59.6935959Z   command: set -eux
# PyYAML 6.0 doesn't work with MacOS x86 anymore
# This must run on Python-3.7 (AmazonLinux2) so can't use request=3.32.2
python3 -m pip install requests==2.27.1 pyyaml==6.0.2

2025-12-04T10:33:59.6936257Z   polling_interval_seconds: 1
2025-12-04T10:33:59.6936438Z   warning_on_retry: true
2025-12-04T10:33:59.6936541Z   continue_on_error: false
2025-12-04T10:33:59.6936642Z env:
2025-12-04T10:33:59.6936732Z   GIT_DEFAULT_BRANCH: main
2025-12-04T10:33:59.6936863Z   RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts
2025-12-04T10:33:59.6937038Z   RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results
2025-12-04T10:33:59.6937198Z   RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs
2025-12-04T10:33:59.6937702Z   GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host
2025-12-04T10:33:59.6938187Z   AWS_DEFAULT_REGION: us-east-1
2025-12-04T10:33:59.6938296Z   AWS_REGION: us-east-1
2025-12-04T10:33:59.6938424Z   AWS_ACCESS_KEY_ID: ***
2025-12-04T10:33:59.6938578Z   AWS_SECRET_ACCESS_KEY: ***
2025-12-04T10:33:59.6940625Z   AWS_SESSION_TOKEN: ***
2025-12-04T10:33:59.6940788Z   GITHUB_TOKEN: ***
2025-12-04T10:33:59.6940883Z ##[endgroup]
2025-12-04T10:33:59.7340498Z + python3 -m pip install requests==2.27.1 pyyaml==6.0.2
2025-12-04T10:33:59.8743590Z Defaulting to user installation because normal site-packages is not writeable
2025-12-04T10:33:59.9638570Z Collecting requests==2.27.1
2025-12-04T10:34:00.0140354Z   Downloading requests-2.27.1-py2.py3-none-any.whl (63 kB)
2025-12-04T10:34:00.0238199Z      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 63.1/63.1 KB 6.3 MB/s eta 0:00:00
2025-12-04T10:34:00.0671724Z Collecting pyyaml==6.0.2
2025-12-04T10:34:00.0724602Z   Downloading PyYAML-6.0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (751 kB)
2025-12-04T10:34:00.1155097Z      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 751.2/751.2 KB 18.8 MB/s eta 0:00:00
2025-12-04T10:34:00.1333483Z Collecting idna<4,>=2.5
2025-12-04T10:34:00.1386713Z   Downloading idna-3.11-py3-none-any.whl (71 kB)
2025-12-04T10:34:00.1406152Z      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 71.0/71.0 KB 57.7 MB/s eta 0:00:00
2025-12-04T10:34:00.1579080Z Collecting certifi>=2017.4.17
2025-12-04T10:34:00.1632394Z   Downloading certifi-2025.11.12-py3-none-any.whl (159 kB)
2025-12-04T10:34:00.1692411Z      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 159.4/159.4 KB 29.5 MB/s eta 0:00:00
2025-12-04T10:34:00.1954372Z Collecting urllib3<1.27,>=1.21.1
2025-12-04T10:34:00.2010893Z   Downloading urllib3-1.26.20-py2.py3-none-any.whl (144 kB)
2025-12-04T10:34:00.2067231Z      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 144.2/144.2 KB 28.8 MB/s eta 0:00:00
2025-12-04T10:34:00.2969181Z Collecting charset-normalizer~=2.0.0
2025-12-04T10:34:00.3024865Z   Downloading charset_normalizer-2.0.12-py3-none-any.whl (39 kB)
2025-12-04T10:34:00.3550486Z Installing collected packages: urllib3, pyyaml, idna, charset-normalizer, certifi, requests
2025-12-04T10:34:00.4475796Z   WARNING: The script normalizer is installed in '/home/runner/.local/bin' which is not on PATH.
2025-12-04T10:34:00.4476734Z   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
2025-12-04T10:34:00.4640930Z Successfully installed certifi-2025.11.12 charset-normalizer-2.0.12 idna-3.11 pyyaml-6.0.2 requests-2.27.1 urllib3-1.26.20
2025-12-04T10:34:00.7344594Z Command completed after 1 attempt(s).
2025-12-04T10:34:00.7390808Z ##[group]Run set -x
2025-12-04T10:34:00.7390977Z [36;1mset -x[0m
2025-12-04T10:34:00.7391106Z [36;1m[0m
2025-12-04T10:34:00.7391299Z [36;1m# Use relative path here as this could be checked out anywhere, not necessarily[0m
2025-12-04T10:34:00.7391529Z [36;1m# in runner workspace[0m
2025-12-04T10:34:00.7391726Z [36;1mpython3 "${GITHUB_ACTION_PATH}/../../scripts/parse_ref.py"[0m
2025-12-04T10:34:00.7396761Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T10:34:00.7396952Z env:
2025-12-04T10:34:00.7397071Z   GIT_DEFAULT_BRANCH: main
2025-12-04T10:34:00.7397253Z   RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts
2025-12-04T10:34:00.7397651Z   RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results
2025-12-04T10:34:00.7397880Z   RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs
2025-12-04T10:34:00.7398399Z   GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host
2025-12-04T10:34:00.7398903Z   AWS_DEFAULT_REGION: us-east-1
2025-12-04T10:34:00.7399025Z   AWS_REGION: us-east-1
2025-12-04T10:34:00.7399205Z   AWS_ACCESS_KEY_ID: ***
2025-12-04T10:34:00.7399368Z   AWS_SECRET_ACCESS_KEY: ***
2025-12-04T10:34:00.7401389Z   AWS_SESSION_TOKEN: ***
2025-12-04T10:34:00.7401498Z ##[endgroup]
2025-12-04T10:34:00.7421954Z + python3 /home/runner/_work/pytorch/pytorch/./.github/actions/filter-test-configs/../../scripts/parse_ref.py
2025-12-04T10:34:00.7510528Z Setting output branch=main
2025-12-04T10:34:00.7547037Z ##[group]Run echo "Workflow: ${GITHUB_WORKFLOW}"
2025-12-04T10:34:00.7547286Z [36;1mecho "Workflow: ${GITHUB_WORKFLOW}"[0m
2025-12-04T10:34:00.7547457Z [36;1mecho "Job name: ${JOB_NAME}"[0m
2025-12-04T10:34:00.7547611Z [36;1m[0m
2025-12-04T10:34:00.7547807Z [36;1m# Use relative path here as this could be checked out anywhere, not necessarily[0m
2025-12-04T10:34:00.7548046Z [36;1m# in runner workspace[0m
2025-12-04T10:34:00.7548266Z [36;1mpython3 "${GITHUB_ACTION_PATH}/../../scripts/filter_test_configs.py" \[0m
2025-12-04T10:34:00.7548499Z [36;1m  --workflow "${GITHUB_WORKFLOW}" \[0m
2025-12-04T10:34:00.7548673Z [36;1m  --job-name "${JOB_NAME}" \[0m
2025-12-04T10:34:00.7552772Z [36;1m  --test-matrix "{"include": [{"config": "default", "shard": 1, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "default", "shard": 1, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "default", "shard": 2, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "default", "shard": 2, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "default", "shard": 3, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "default", "shard": 3, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "default", "shard": 4, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "default", "shard": 4, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "default", "shard": 5, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "default", "shard": 5, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "default", "shard": 6, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "default", "shard": 6, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}]}" \[0m
2025-12-04T10:34:00.7556367Z [36;1m  --selected-test-configs "" \[0m
2025-12-04T10:34:00.7556503Z [36;1m  --pr-number "${PR_NUMBER}" \[0m
2025-12-04T10:34:00.7556634Z [36;1m  --tag "${TAG}" \[0m
2025-12-04T10:34:00.7556757Z [36;1m  --event-name "${EVENT_NAME}" \[0m
2025-12-04T10:34:00.7556887Z [36;1m  --schedule "${SCHEDULE}" \[0m
2025-12-04T10:34:00.7557012Z [36;1m  --branch "${HEAD_BRANCH}"[0m
2025-12-04T10:34:00.7561296Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T10:34:00.7561450Z env:
2025-12-04T10:34:00.7561546Z   GIT_DEFAULT_BRANCH: main
2025-12-04T10:34:00.7561688Z   RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts
2025-12-04T10:34:00.7561880Z   RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results
2025-12-04T10:34:00.7562050Z   RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs
2025-12-04T10:34:00.7562570Z   GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host
2025-12-04T10:34:00.7563097Z   AWS_DEFAULT_REGION: us-east-1
2025-12-04T10:34:00.7563215Z   AWS_REGION: us-east-1
2025-12-04T10:34:00.7563383Z   AWS_ACCESS_KEY_ID: ***
2025-12-04T10:34:00.7563539Z   AWS_SECRET_ACCESS_KEY: ***
2025-12-04T10:34:00.7565519Z   AWS_SESSION_TOKEN: ***
2025-12-04T10:34:00.7565723Z   GITHUB_TOKEN: ***
2025-12-04T10:34:00.7565924Z   JOB_NAME: linux-jammy-rocm-py3.10 / test (distributed, 2, 3, linux.rocm.gpu.gfx942.4.b, mem_leak_check, unstable)
2025-12-04T10:34:00.7566132Z   PR_NUMBER: 
2025-12-04T10:34:00.7566221Z   TAG: 
2025-12-04T10:34:00.7566305Z   EVENT_NAME: schedule
2025-12-04T10:34:00.7566404Z   SCHEDULE: 29 8 * * *
2025-12-04T10:34:00.7566501Z   HEAD_BRANCH: main
2025-12-04T10:34:00.7566597Z ##[endgroup]
2025-12-04T10:34:00.7586594Z Workflow: trunk-rocm-mi300
2025-12-04T10:34:00.7586813Z Job name: linux-jammy-rocm-py3.10 / test (distributed, 2, 3, linux.rocm.gpu.gfx942.4.b, mem_leak_check, unstable)
2025-12-04T10:34:01.3420283Z INFO:root:Issue https://github.com/pytorch/pytorch/issues/167616 created by jithunnair-amd has unstable all the test jobs for trunk-rocm-mi300 / linux-jammy-rocm-py3.10 / test (distributed, 2, 3, linux.rocm.gpu.gfx942.4.b, mem_leak_check, unstable)
2025-12-04T10:34:01.3649753Z Setting output keep-going=True
2025-12-04T10:34:01.3650068Z Setting output ci-verbose-test-logs=False
2025-12-04T10:34:01.3650321Z Setting output ci-test-showlocals=False
2025-12-04T10:34:01.3650809Z Setting output ci-no-test-timeout=False
2025-12-04T10:34:01.3651018Z Setting output ci-no-td=False
2025-12-04T10:34:01.3651218Z Setting output ci-td-distributed=False
2025-12-04T10:34:01.3651422Z Setting output is-unstable=True
2025-12-04T10:34:01.3651625Z Setting output reenabled-issues=
2025-12-04T10:34:01.3662068Z Setting output test-matrix={"include": [{"config": "default", "shard": 1, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "default", "shard": 1, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 1, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "default", "shard": 2, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "default", "shard": 2, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "default", "shard": 3, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "default", "shard": 3, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "default", "shard": 4, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "default", "shard": 4, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "default", "shard": 5, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "default", "shard": 5, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "default", "shard": 6, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "default", "shard": 6, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 6, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 6, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}]}
2025-12-04T10:34:01.3671008Z Setting output is-test-matrix-empty=False
2025-12-04T10:34:01.3758874Z ##[group]Run echo "Filtered matrix:"
2025-12-04T10:34:01.3759130Z [36;1mecho "Filtered matrix:"[0m
2025-12-04T10:34:01.3768002Z [36;1mecho "{"include": [{"config": "default", "shard": 1, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "default", "shard": 1, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 1, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "default", "shard": 2, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "default", "shard": 2, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "default", "shard": 3, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "default", "shard": 3, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "default", "shard": 4, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "default", "shard": 4, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "default", "shard": 5, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "default", "shard": 5, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "default", "shard": 6, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "default", "shard": 6, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 6, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 6, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "linux.rocm.gpu.gfx942.4.b", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}]}"[0m
2025-12-04T10:34:01.3775200Z [36;1m[0m
2025-12-04T10:34:01.3775293Z [36;1mecho[0m
2025-12-04T10:34:01.3775413Z [36;1mecho "Is the current job unstable? True"[0m
2025-12-04T10:34:01.3775551Z [36;1m[0m
2025-12-04T10:34:01.3775633Z [36;1mecho[0m
2025-12-04T10:34:01.3775743Z [36;1mecho "Is keep-going label set? True"[0m
2025-12-04T10:34:01.3775872Z [36;1m[0m
2025-12-04T10:34:01.3775956Z [36;1mecho[0m
2025-12-04T10:34:01.3776059Z [36;1mecho "Reenabled issues? "[0m
2025-12-04T10:34:01.3780507Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T10:34:01.3780661Z env:
2025-12-04T10:34:01.3780764Z   GIT_DEFAULT_BRANCH: main
2025-12-04T10:34:01.3780903Z   RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts
2025-12-04T10:34:01.3781091Z   RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results
2025-12-04T10:34:01.3781260Z   RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs
2025-12-04T10:34:01.3781761Z   GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host
2025-12-04T10:34:01.3782248Z   AWS_DEFAULT_REGION: us-east-1
2025-12-04T10:34:01.3782373Z   AWS_REGION: us-east-1
2025-12-04T10:34:01.3782556Z   AWS_ACCESS_KEY_ID: ***
2025-12-04T10:34:01.3782781Z   AWS_SECRET_ACCESS_KEY: ***
2025-12-04T10:34:01.3784753Z   AWS_SESSION_TOKEN: ***
2025-12-04T10:34:01.3784863Z ##[endgroup]
2025-12-04T10:34:01.3804818Z Filtered matrix:
2025-12-04T10:34:01.3818477Z {include: [{config: default, shard: 1, num_shards: 6, runner: linux.rocm.gpu.gfx942.1.b, mem_leak_check: mem_leak_check, unstable: unstable}, {config: default, shard: 1, num_shards: 6, runner: linux.rocm.gpu.gfx942.1.b, mem_leak_check: mem_leak_check, unstable: unstable, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 1, num_shards: 6, runner: linux.rocm.gpu.gfx942.1.b, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable, mem_leak_check: mem_leak_check}, {config: default, shard: 1, num_shards: 6, runner: linux.rocm.gpu.gfx942.1.b, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable}, {config: default, shard: 2, num_shards: 6, runner: linux.rocm.gpu.gfx942.1.b, mem_leak_check: mem_leak_check, unstable: unstable}, {config: default, shard: 2, num_shards: 6, runner: linux.rocm.gpu.gfx942.1.b, mem_leak_check: mem_leak_check, unstable: unstable, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 2, num_shards: 6, runner: linux.rocm.gpu.gfx942.1.b, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable, mem_leak_check: mem_leak_check}, {config: default, shard: 2, num_shards: 6, runner: linux.rocm.gpu.gfx942.1.b, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable}, {config: default, shard: 3, num_shards: 6, runner: linux.rocm.gpu.gfx942.1.b, mem_leak_check: mem_leak_check, unstable: unstable}, {config: default, shard: 3, num_shards: 6, runner: linux.rocm.gpu.gfx942.1.b, mem_leak_check: mem_leak_check, unstable: unstable, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 3, num_shards: 6, runner: linux.rocm.gpu.gfx942.1.b, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable, mem_leak_check: mem_leak_check}, {config: default, shard: 3, num_shards: 6, runner: linux.rocm.gpu.gfx942.1.b, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable}, {config: default, shard: 4, num_shards: 6, runner: linux.rocm.gpu.gfx942.1.b, mem_leak_check: mem_leak_check, unstable: unstable}, {config: default, shard: 4, num_shards: 6, runner: linux.rocm.gpu.gfx942.1.b, mem_leak_check: mem_leak_check, unstable: unstable, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 4, num_shards: 6, runner: linux.rocm.gpu.gfx942.1.b, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable, mem_leak_check: mem_leak_check}, {config: default, shard: 4, num_shards: 6, runner: linux.rocm.gpu.gfx942.1.b, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable}, {config: default, shard: 5, num_shards: 6, runner: linux.rocm.gpu.gfx942.1.b, mem_leak_check: mem_leak_check, unstable: unstable}, {config: default, shard: 5, num_shards: 6, runner: linux.rocm.gpu.gfx942.1.b, mem_leak_check: mem_leak_check, unstable: unstable, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 5, num_shards: 6, runner: linux.rocm.gpu.gfx942.1.b, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable, mem_leak_check: mem_leak_check}, {config: default, shard: 5, num_shards: 6, runner: linux.rocm.gpu.gfx942.1.b, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable}, {config: default, shard: 6, num_shards: 6, runner: linux.rocm.gpu.gfx942.1.b, mem_leak_check: mem_leak_check, unstable: unstable}, {config: default, shard: 6, num_shards: 6, runner: linux.rocm.gpu.gfx942.1.b, mem_leak_check: mem_leak_check, unstable: unstable, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 6, num_shards: 6, runner: linux.rocm.gpu.gfx942.1.b, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable, mem_leak_check: mem_leak_check}, {config: default, shard: 6, num_shards: 6, runner: linux.rocm.gpu.gfx942.1.b, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable}, {config: distributed, shard: 1, num_shards: 3, runner: linux.rocm.gpu.gfx942.4.b, mem_leak_check: mem_leak_check, unstable: unstable}, {config: distributed, shard: 1, num_shards: 3, runner: linux.rocm.gpu.gfx942.4.b, mem_leak_check: mem_leak_check, unstable: unstable, rerun_disabled_tests: rerun_disabled_tests}, {config: distributed, shard: 1, num_shards: 3, runner: linux.rocm.gpu.gfx942.4.b, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable, mem_leak_check: mem_leak_check}, {config: distributed, shard: 1, num_shards: 3, runner: linux.rocm.gpu.gfx942.4.b, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable}, {config: distributed, shard: 2, num_shards: 3, runner: linux.rocm.gpu.gfx942.4.b, mem_leak_check: mem_leak_check, unstable: unstable}, {config: distributed, shard: 2, num_shards: 3, runner: linux.rocm.gpu.gfx942.4.b, mem_leak_check: mem_leak_check, unstable: unstable, rerun_disabled_tests: rerun_disabled_tests}, {config: distributed, shard: 2, num_shards: 3, runner: linux.rocm.gpu.gfx942.4.b, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable, mem_leak_check: mem_leak_check}, {config: distributed, shard: 2, num_shards: 3, runner: linux.rocm.gpu.gfx942.4.b, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable}, {config: distributed, shard: 3, num_shards: 3, runner: linux.rocm.gpu.gfx942.4.b, mem_leak_check: mem_leak_check, unstable: unstable}, {config: distributed, shard: 3, num_shards: 3, runner: linux.rocm.gpu.gfx942.4.b, mem_leak_check: mem_leak_check, unstable: unstable, rerun_disabled_tests: rerun_disabled_tests}, {config: distributed, shard: 3, num_shards: 3, runner: linux.rocm.gpu.gfx942.4.b, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable, mem_leak_check: mem_leak_check}, {config: distributed, shard: 3, num_shards: 3, runner: linux.rocm.gpu.gfx942.4.b, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable}]}
2025-12-04T10:34:01.3827365Z 
2025-12-04T10:34:01.3827435Z Is the current job unstable? True
2025-12-04T10:34:01.3827544Z 
2025-12-04T10:34:01.3827603Z Is keep-going label set? True
2025-12-04T10:34:01.3827708Z 
2025-12-04T10:34:01.3827761Z Reenabled issues? 
2025-12-04T10:34:01.3854814Z ##[group]Run echo "timeout=$((JOB_TIMEOUT-30))" >> "${GITHUB_OUTPUT}"
2025-12-04T10:34:01.3855052Z [36;1mecho "timeout=$((JOB_TIMEOUT-30))" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T10:34:01.3860087Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T10:34:01.3860242Z env:
2025-12-04T10:34:01.3860343Z   GIT_DEFAULT_BRANCH: main
2025-12-04T10:34:01.3860486Z   RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts
2025-12-04T10:34:01.3860671Z   RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results
2025-12-04T10:34:01.3860842Z   RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs
2025-12-04T10:34:01.3861356Z   GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host
2025-12-04T10:34:01.3861873Z   AWS_DEFAULT_REGION: us-east-1
2025-12-04T10:34:01.3861995Z   AWS_REGION: us-east-1
2025-12-04T10:34:01.3862175Z   AWS_ACCESS_KEY_ID: ***
2025-12-04T10:34:01.3862333Z   AWS_SECRET_ACCESS_KEY: ***
2025-12-04T10:34:01.3864361Z   AWS_SESSION_TOKEN: ***
2025-12-04T10:34:01.3864471Z   JOB_TIMEOUT: 600
2025-12-04T10:34:01.3864575Z ##[endgroup]
2025-12-04T10:34:01.3912538Z ##[group]Run env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}"
2025-12-04T10:34:01.3912817Z [36;1menv | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}"[0m
2025-12-04T10:34:01.3913057Z [36;1menv | grep '^CI' >> "/tmp/github_env_${GITHUB_RUN_ID}"[0m
2025-12-04T10:34:01.3917659Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T10:34:01.3917857Z env:
2025-12-04T10:34:01.3917988Z   GIT_DEFAULT_BRANCH: main
2025-12-04T10:34:01.3918174Z   RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts
2025-12-04T10:34:01.3918411Z   RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results
2025-12-04T10:34:01.3918630Z   RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs
2025-12-04T10:34:01.3919261Z   GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host
2025-12-04T10:34:01.3919838Z   AWS_DEFAULT_REGION: us-east-1
2025-12-04T10:34:01.3919966Z   AWS_REGION: us-east-1
2025-12-04T10:34:01.3920145Z   AWS_ACCESS_KEY_ID: ***
2025-12-04T10:34:01.3920310Z   AWS_SECRET_ACCESS_KEY: ***
2025-12-04T10:34:01.3922358Z   AWS_SESSION_TOKEN: ***
2025-12-04T10:34:01.3922475Z ##[endgroup]
2025-12-04T10:34:01.4001551Z ##[group]Run set -x
2025-12-04T10:34:01.4001705Z [36;1mset -x[0m
2025-12-04T10:34:01.4001806Z [36;1m[0m
2025-12-04T10:34:01.4001922Z [36;1mif [[ $TEST_CONFIG == 'multigpu' ]]; then[0m
2025-12-04T10:34:01.4002086Z [36;1m  TEST_COMMAND=.ci/pytorch/multigpu-test.sh[0m
2025-12-04T10:34:01.4002250Z [36;1melif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then[0m
2025-12-04T10:34:01.4002411Z [36;1m  TEST_COMMAND=.ci/caffe2/test.sh[0m
2025-12-04T10:34:01.4002552Z [36;1melse[0m
2025-12-04T10:34:01.4002663Z [36;1m  TEST_COMMAND=.ci/pytorch/test.sh[0m
2025-12-04T10:34:01.4002789Z [36;1mfi[0m
2025-12-04T10:34:01.4002881Z [36;1m[0m
2025-12-04T10:34:01.4003017Z [36;1m# detached container should get cleaned up by teardown_ec2_linux[0m
2025-12-04T10:34:01.4003223Z [36;1m# TODO: Stop building test binaries as part of the build phase[0m
2025-12-04T10:34:01.4003407Z [36;1m# Used for GPU_FLAG since that doesn't play nice[0m
2025-12-04T10:34:01.4003580Z [36;1m# shellcheck disable=SC2086,SC2090[0m
2025-12-04T10:34:01.4003719Z [36;1mcontainer_name=$(docker run \[0m
2025-12-04T10:34:01.4003850Z [36;1m  ${GPU_FLAG:-} \[0m
2025-12-04T10:34:01.4003972Z [36;1m  -e BUILD_ENVIRONMENT \[0m
2025-12-04T10:34:01.4004098Z [36;1m  -e PR_NUMBER \[0m
2025-12-04T10:34:01.4004214Z [36;1m  -e GITHUB_ACTIONS \[0m
2025-12-04T10:34:01.4004334Z [36;1m  -e GITHUB_REPOSITORY \[0m
2025-12-04T10:34:01.4014603Z [36;1m  -e GITHUB_WORKFLOW \[0m
2025-12-04T10:34:01.4014747Z [36;1m  -e GITHUB_JOB \[0m
2025-12-04T10:34:01.4015003Z [36;1m  -e GITHUB_RUN_ID \[0m
2025-12-04T10:34:01.4015125Z [36;1m  -e GITHUB_RUN_NUMBER \[0m
2025-12-04T10:34:01.4015255Z [36;1m  -e GITHUB_RUN_ATTEMPT \[0m
2025-12-04T10:34:01.4015379Z [36;1m  -e JOB_ID \[0m
2025-12-04T10:34:01.4015485Z [36;1m  -e JOB_NAME \[0m
2025-12-04T10:34:01.4015597Z [36;1m  -e BASE_SHA \[0m
2025-12-04T10:34:01.4015706Z [36;1m  -e BRANCH \[0m
2025-12-04T10:34:01.4015810Z [36;1m  -e SHA1 \[0m
2025-12-04T10:34:01.4015926Z [36;1m  -e AWS_DEFAULT_REGION \[0m
2025-12-04T10:34:01.4016050Z [36;1m  -e IN_WHEEL_TEST \[0m
2025-12-04T10:34:01.4016169Z [36;1m  -e SHARD_NUMBER \[0m
2025-12-04T10:34:01.4016287Z [36;1m  -e TEST_CONFIG \[0m
2025-12-04T10:34:01.4016403Z [36;1m  -e NUM_TEST_SHARDS \[0m
2025-12-04T10:34:01.4016526Z [36;1m  -e REENABLED_ISSUES \[0m
2025-12-04T10:34:01.4016654Z [36;1m  -e CONTINUE_THROUGH_ERROR \[0m
2025-12-04T10:34:01.4016784Z [36;1m  -e VERBOSE_TEST_LOGS \[0m
2025-12-04T10:34:01.4016907Z [36;1m  -e TEST_SHOWLOCALS \[0m
2025-12-04T10:34:01.4017033Z [36;1m  -e NO_TEST_TIMEOUT \[0m
2025-12-04T10:34:01.4017152Z [36;1m  -e NO_TD \[0m
2025-12-04T10:34:01.4017273Z [36;1m  -e MAX_JOBS="$(nproc --ignore=2)" \[0m
2025-12-04T10:34:01.4017424Z [36;1m  -e PYTORCH_TEST_CUDA_MEM_LEAK_CHECK \[0m
2025-12-04T10:34:01.4017573Z [36;1m  -e PYTORCH_TEST_RERUN_DISABLED_TESTS \[0m
2025-12-04T10:34:01.4017712Z [36;1m  -e TESTS_TO_INCLUDE \[0m
2025-12-04T10:34:01.4017837Z [36;1m  -e HUGGING_FACE_HUB_TOKEN \[0m
2025-12-04T10:34:01.4017964Z [36;1m  -e DASHBOARD_TAG \[0m
2025-12-04T10:34:01.4018116Z [36;1m  --env-file="${RUNNER_TEMP}/github_env_${GITHUB_RUN_ID}" \[0m
2025-12-04T10:34:01.4018282Z [36;1m  --ulimit stack=10485760:83886080 \[0m
2025-12-04T10:34:01.4018411Z [36;1m  --ulimit core=0 \[0m
2025-12-04T10:34:01.4018549Z [36;1m  --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \[0m
2025-12-04T10:34:01.4018707Z [36;1m  --security-opt seccomp=unconfined \[0m
2025-12-04T10:34:01.4018848Z [36;1m  --cap-add=SYS_PTRACE \[0m
2025-12-04T10:34:01.4018977Z [36;1m  --shm-size="8g" \[0m
2025-12-04T10:34:01.4019090Z [36;1m  --tty \[0m
2025-12-04T10:34:01.4019192Z [36;1m  --detach \[0m
2025-12-04T10:34:01.4019307Z [36;1m  --name="${container_name}" \[0m
2025-12-04T10:34:01.4019434Z [36;1m  --user jenkins \[0m
2025-12-04T10:34:01.4019624Z [36;1m  -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \[0m
2025-12-04T10:34:01.4019784Z [36;1m  -w /var/lib/jenkins/workspace \[0m
2025-12-04T10:34:01.4019975Z [36;1m  "${DOCKER_IMAGE}"[0m
2025-12-04T10:34:01.4020084Z [36;1m)[0m
2025-12-04T10:34:01.4020193Z [36;1m# save container name for later step[0m
2025-12-04T10:34:01.4020359Z [36;1mecho "CONTAINER_NAME=${container_name}" >> "$GITHUB_ENV"[0m
2025-12-04T10:34:01.4020636Z [36;1m# jenkins user does not have write permission to mounted workspace; work-around by copying within container to jenkins home[0m
2025-12-04T10:34:01.4020988Z [36;1mdocker exec -t "${container_name}" sh -c "cd .. && cp -R workspace pytorch && cd pytorch && pip install dist/*.whl && ${TEST_COMMAND}"[0m
2025-12-04T10:34:01.4024090Z shell: /usr/bin/bash -e {0}
2025-12-04T10:34:01.4024206Z env:
2025-12-04T10:34:01.4024305Z   GIT_DEFAULT_BRANCH: main
2025-12-04T10:34:01.4024448Z   RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts
2025-12-04T10:34:01.4024630Z   RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results
2025-12-04T10:34:01.4024800Z   RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs
2025-12-04T10:34:01.4025315Z   GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host
2025-12-04T10:34:01.4025808Z   AWS_DEFAULT_REGION: us-east-1
2025-12-04T10:34:01.4025927Z   AWS_REGION: us-east-1
2025-12-04T10:34:01.4026075Z   AWS_ACCESS_KEY_ID: ***
2025-12-04T10:34:01.4026281Z   AWS_SECRET_ACCESS_KEY: ***
2025-12-04T10:34:01.4028277Z   AWS_SESSION_TOKEN: ***
2025-12-04T10:34:01.4028405Z   BUILD_ENVIRONMENT: linux-jammy-rocm-py3.10
2025-12-04T10:34:01.4028538Z   PR_NUMBER: 
2025-12-04T10:34:01.4028644Z   GITHUB_REPOSITORY: pytorch/pytorch
2025-12-04T10:34:01.4028777Z   GITHUB_WORKFLOW: trunk-rocm-mi300
2025-12-04T10:34:01.4028899Z   GITHUB_JOB: test
2025-12-04T10:34:01.4029003Z   GITHUB_RUN_ID: 19922849170
2025-12-04T10:34:01.4029118Z   GITHUB_RUN_NUMBER: 689
2025-12-04T10:34:01.4029233Z   GITHUB_RUN_ATTEMPT: 1
2025-12-04T10:34:01.4029338Z   JOB_ID: 57116213187
2025-12-04T10:34:01.4029545Z   JOB_NAME: linux-jammy-rocm-py3.10 / test (distributed, 2, 3, linux.rocm.gpu.gfx942.4.b, mem_leak_check, unstable)
2025-12-04T10:34:01.4029805Z   BRANCH: main
2025-12-04T10:34:01.4029919Z   SHA1: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T10:34:01.4030074Z   BASE_SHA: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T10:34:01.4030213Z   TEST_CONFIG: distributed
2025-12-04T10:34:01.4030326Z   SHARD_NUMBER: 2
2025-12-04T10:34:01.4030424Z   NUM_TEST_SHARDS: 3
2025-12-04T10:34:01.4030529Z   REENABLED_ISSUES: 
2025-12-04T10:34:01.4030634Z   CONTINUE_THROUGH_ERROR: True
2025-12-04T10:34:01.4030750Z   VERBOSE_TEST_LOGS: False
2025-12-04T10:34:01.4030858Z   TEST_SHOWLOCALS: False
2025-12-04T10:34:01.4030965Z   NO_TEST_TIMEOUT: False
2025-12-04T10:34:01.4031067Z   NO_TD: False
2025-12-04T10:34:01.4031340Z   DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-rocm-n-py3-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T10:34:01.4031637Z   PYTORCH_TEST_CUDA_MEM_LEAK_CHECK: 1
2025-12-04T10:34:01.4031769Z   PYTORCH_TEST_RERUN_DISABLED_TESTS: 0
2025-12-04T10:34:01.4031891Z   TESTS_TO_INCLUDE: 
2025-12-04T10:34:01.4031993Z   DASHBOARD_TAG: 
2025-12-04T10:34:01.4032135Z   HUGGING_FACE_HUB_TOKEN: ***
2025-12-04T10:34:01.4032249Z ##[endgroup]
2025-12-04T10:34:01.4048560Z + [[ distributed == \m\u\l\t\i\g\p\u ]]
2025-12-04T10:34:01.4048998Z + [[ linux-jammy-rocm-py3.10 == *onnx* ]]
2025-12-04T10:34:01.4049372Z + TEST_COMMAND=.ci/pytorch/test.sh
2025-12-04T10:34:01.4055398Z +++ nproc --ignore=2
2025-12-04T10:34:01.4065519Z ++ docker run --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host -e BUILD_ENVIRONMENT -e PR_NUMBER -e GITHUB_ACTIONS -e GITHUB_REPOSITORY -e GITHUB_WORKFLOW -e GITHUB_JOB -e GITHUB_RUN_ID -e GITHUB_RUN_NUMBER -e GITHUB_RUN_ATTEMPT -e JOB_ID -e JOB_NAME -e BASE_SHA -e BRANCH -e SHA1 -e AWS_DEFAULT_REGION -e IN_WHEEL_TEST -e SHARD_NUMBER -e TEST_CONFIG -e NUM_TEST_SHARDS -e REENABLED_ISSUES -e CONTINUE_THROUGH_ERROR -e VERBOSE_TEST_LOGS -e TEST_SHOWLOCALS -e NO_TEST_TIMEOUT -e NO_TD -e MAX_JOBS=126 -e PYTORCH_TEST_CUDA_MEM_LEAK_CHECK -e PYTORCH_TEST_RERUN_DISABLED_TESTS -e TESTS_TO_INCLUDE -e HUGGING_FACE_HUB_TOKEN -e DASHBOARD_TAG --env-file=/home/runner/_work/_temp/github_env_19922849170 --ulimit stack=10485760:83886080 --ulimit core=0 --env-file=/tmp/github_env_19922849170 --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --shm-size=8g --tty --detach --name= --user jenkins -v /home/runner/_work/pytorch/pytorch:/var/lib/jenkins/workspace -w /var/lib/jenkins/workspace 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-rocm-n-py3-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T10:34:01.5933874Z + container_name=8ac2e1cca5c5a27e1632f345b696937e28036af305fa8c40ccee0a9f11fed68c
2025-12-04T10:34:01.5934267Z + echo CONTAINER_NAME=8ac2e1cca5c5a27e1632f345b696937e28036af305fa8c40ccee0a9f11fed68c
2025-12-04T10:34:01.5934851Z + docker exec -t 8ac2e1cca5c5a27e1632f345b696937e28036af305fa8c40ccee0a9f11fed68c sh -c 'cd .. && cp -R workspace pytorch && cd pytorch && pip install dist/*.whl && .ci/pytorch/test.sh'
2025-12-04T10:34:04.8557576Z Processing ./dist/torch-2.10.0a0+gitffd9b0f-cp310-cp310-linux_x86_64.whl
2025-12-04T10:34:05.4105025Z Requirement already satisfied: filelock in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f) (3.18.0)
2025-12-04T10:34:05.4106313Z Requirement already satisfied: typing-extensions>=4.10.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f) (4.12.2)
2025-12-04T10:34:05.4107876Z Requirement already satisfied: sympy>=1.13.3 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f) (1.13.3)
2025-12-04T10:34:05.4110374Z Requirement already satisfied: networkx>=2.5.1 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f) (2.8.8)
2025-12-04T10:34:05.4112621Z Requirement already satisfied: jinja2 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f) (3.1.6)
2025-12-04T10:34:05.4113701Z Requirement already satisfied: fsspec>=0.8.5 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f) (2025.10.0)
2025-12-04T10:34:05.4274394Z Requirement already satisfied: mpmath<1.4,>=1.1.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from sympy>=1.13.3->torch==2.10.0a0+gitffd9b0f) (1.3.0)
2025-12-04T10:34:05.4298092Z Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from jinja2->torch==2.10.0a0+gitffd9b0f) (3.0.3)
2025-12-04T10:34:05.6241484Z Installing collected packages: torch
2025-12-04T10:34:11.1293427Z Successfully installed torch-2.10.0a0+gitffd9b0f
2025-12-04T10:34:11.1676440Z + export TERM=vt100
2025-12-04T10:34:11.1676843Z + TERM=vt100
2025-12-04T10:34:11.1680009Z ++ dirname .ci/pytorch/test.sh
2025-12-04T10:34:11.1696934Z + source .ci/pytorch/common.sh
2025-12-04T10:34:11.1699496Z +++ dirname .ci/pytorch/common.sh
2025-12-04T10:34:11.1707856Z ++ source .ci/pytorch/common_utils.sh
2025-12-04T10:34:11.1709865Z +++ declare -f -t trap_add
2025-12-04T10:34:11.1715920Z ++ set -ex -o pipefail
2025-12-04T10:34:11.1716229Z ++ [[ linux-jammy-rocm-py3.10 == *rocm* ]]
2025-12-04T10:34:11.1716487Z ++ unset HIP_PLATFORM
2025-12-04T10:34:11.1716695Z ++ export PYTORCH_TEST_WITH_ROCM=1
2025-12-04T10:34:11.1716928Z ++ PYTORCH_TEST_WITH_ROCM=1
2025-12-04T10:34:11.1717147Z ++ BUILD_TEST_LIBTORCH=0
2025-12-04T10:34:11.1721129Z ++ dirname .ci/pytorch/test.sh
2025-12-04T10:34:11.1728325Z + source .ci/pytorch/common-build.sh
2025-12-04T10:34:11.1729857Z ++ [[ linux-jammy-rocm-py3.10 != *win-* ]]
2025-12-04T10:34:11.1736243Z ++++ dirname .ci/pytorch/common-build.sh
2025-12-04T10:34:11.1745710Z +++ cd .ci/pytorch
2025-12-04T10:34:11.1745895Z +++ pwd -P
2025-12-04T10:34:11.1748276Z ++ script_dir=/var/lib/jenkins/pytorch/.ci/pytorch
2025-12-04T10:34:11.1748560Z ++ [[ linux-jammy-rocm-py3.10 == *-pch* ]]
2025-12-04T10:34:11.1748751Z ++ which sccache
2025-12-04T10:34:11.1760989Z ++ [[ -z '' ]]
2025-12-04T10:34:11.1761199Z ++ unset SCCACHE_BUCKET
2025-12-04T10:34:11.1761353Z ++ unset SCCACHE_REGION
2025-12-04T10:34:11.1761513Z ++ sccache --stop-server
2025-12-04T10:34:11.1782475Z ++ true
2025-12-04T10:34:11.1782649Z ++ rm -f /var/lib/jenkins/sccache_error.log
2025-12-04T10:34:11.1792683Z ++ trap_add sccache_epilogue EXIT
2025-12-04T10:34:11.1792881Z ++ trap_add_cmd=sccache_epilogue
2025-12-04T10:34:11.1793042Z ++ shift
2025-12-04T10:34:11.1793191Z ++ for trap_add_name in "$@"
2025-12-04T10:34:11.1800844Z ++++ trap -p EXIT
2025-12-04T10:34:11.1803144Z +++ eval 'extract_trap_cmd '
2025-12-04T10:34:11.1803320Z ++++ extract_trap_cmd
2025-12-04T10:34:11.1803464Z ++++ printf '%s\n' ''
2025-12-04T10:34:11.1804002Z +++ printf '%s\n' sccache_epilogue
2025-12-04T10:34:11.1805936Z ++ trap -- '
2025-12-04T10:34:11.1806078Z sccache_epilogue' EXIT
2025-12-04T10:34:11.1806226Z ++ [[ -n '' ]]
2025-12-04T10:34:11.1806382Z ++ [[ linux-jammy-rocm-py3.10 == *rocm* ]]
2025-12-04T10:34:11.1806620Z ++ SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log
2025-12-04T10:34:11.1806826Z ++ SCCACHE_IDLE_TIMEOUT=0
2025-12-04T10:34:11.1806986Z ++ sccache --start-server
2025-12-04T10:34:11.1824585Z sccache: Starting the server...
2025-12-04T10:34:11.2010615Z sccache: Listening on address 127.0.0.1:4226
2025-12-04T10:34:11.2018176Z ++ sccache --zero-stats
2025-12-04T10:34:11.2033411Z Statistics zeroed.
2025-12-04T10:34:11.2035621Z ++ which ccache
2025-12-04T10:34:11.2043370Z + [[ linux-jammy-rocm-py3.10 != *rocm* ]]
2025-12-04T10:34:11.2043576Z + [[ linux-jammy-rocm-py3.10 == *cuda* ]]
2025-12-04T10:34:11.2043757Z + echo 'Environment variables:'
2025-12-04T10:34:11.2043927Z Environment variables:
2025-12-04T10:34:11.2044069Z + env
2025-12-04T10:34:11.2050582Z GITHUB_WORKSPACE=/home/runner/_work/pytorch/pytorch
2025-12-04T10:34:11.2050786Z CONTINUE_THROUGH_ERROR=True
2025-12-04T10:34:11.2050977Z BUILD_ENVIRONMENT=linux-jammy-rocm-py3.10
2025-12-04T10:34:11.2051204Z HOSTNAME=linux.rocm.gpu.gfx942.4.b-bphpw-runner-5l4hk
2025-12-04T10:34:11.2051513Z GITHUB_PATH=/home/runner/_work/_temp/_runner_file_commands/add_path_c2a8d1b0-3c11-4303-ae05-bde093f6a837
2025-12-04T10:34:11.2051780Z GITHUB_ACTION=__run_2
2025-12-04T10:34:11.2051929Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1
2025-12-04T10:34:11.2052082Z GITHUB_RUN_NUMBER=689
2025-12-04T10:34:11.2052218Z TEST_CONFIG=distributed
2025-12-04T10:34:11.2052392Z RUNNER_NAME=linux.rocm.gpu.gfx942.4.b-bphpw-runner-5l4hk
2025-12-04T10:34:11.2052595Z GITHUB_REPOSITORY_OWNER_ID=21003710
2025-12-04T10:34:11.2052756Z AWS_DEFAULT_REGION=us-east-1
2025-12-04T10:34:11.2052939Z RUNNER_ARTIFACT_DIR=/home/runner/_work/_temp/artifacts
2025-12-04T10:34:11.2053141Z GITHUB_TRIGGERING_ACTOR=pytorchmergebot
2025-12-04T10:34:11.2053305Z GITHUB_REF_TYPE=branch
2025-12-04T10:34:11.2053516Z BASE_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T10:34:11.2053890Z HUGGING_FACE_HUB_TOKEN=***
2025-12-04T10:34:11.2056666Z ***
2025-12-04T10:34:11.2056790Z GITHUB_REPOSITORY_ID=65600975
2025-12-04T10:34:11.2056941Z GITHUB_ACTIONS=true
2025-12-04T10:34:11.2057092Z SHA1=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T10:34:11.2057286Z GITHUB_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T10:34:11.2057563Z GITHUB_WORKFLOW_REF=pytorch/pytorch/.github/workflows/trunk-rocm-mi300.yml@refs/heads/main
2025-12-04T10:34:11.2057808Z UCC_HOME=/usr
2025-12-04T10:34:11.2057925Z RUNNER_ENVIRONMENT=self-hosted
2025-12-04T10:34:11.2058065Z VERBOSE_TEST_LOGS=False
2025-12-04T10:34:11.2058187Z GITHUB_REF=refs/heads/main
2025-12-04T10:34:11.2058313Z RUNNER_OS=Linux
2025-12-04T10:34:11.2058420Z SHARD_NUMBER=2
2025-12-04T10:34:11.2058535Z GITHUB_REF_PROTECTED=true
2025-12-04T10:34:11.2058814Z RUNNER_MANUALLY_TRAP_SIG=1
2025-12-04T10:34:11.2058939Z HOME=/var/lib/jenkins
2025-12-04T10:34:11.2059083Z GITHUB_API_URL=https://api.github.com
2025-12-04T10:34:11.2059241Z PYTORCH_TEST_RERUN_DISABLED_TESTS=0
2025-12-04T10:34:11.2059401Z RUNNER_DOCS_DIR=/home/runner/_work/_temp/docs
2025-12-04T10:34:11.2059547Z LANG=C.UTF-8
2025-12-04T10:34:11.2059725Z UCX_COMMIT=29831d319e6be55cb8c768ca61de335c934ca39e
2025-12-04T10:34:11.2059890Z PYTORCH_TEST_WITH_ROCM=1
2025-12-04T10:34:11.2060063Z RUNNER_TRACKING_ID=github_4b208c78-f2ba-477a-8e64-14a9af1f4823
2025-12-04T10:34:11.2060234Z RUNNER_ARCH=X64
2025-12-04T10:34:11.2060360Z RUNNER_TEMP=/home/runner/_work/_temp
2025-12-04T10:34:11.2060494Z NUM_TEST_SHARDS=3
2025-12-04T10:34:11.2060607Z UCX_HOME=/usr
2025-12-04T10:34:11.2060826Z GITHUB_STATE=/home/runner/_work/_temp/_runner_file_commands/save_state_c2a8d1b0-3c11-4303-ae05-bde093f6a837
2025-12-04T10:34:11.2061188Z JOB_NAME=linux-jammy-rocm-py3.10 / test (distributed, 2, 3, linux.rocm.gpu.gfx942.4.b, mem_leak_check, unstable)
2025-12-04T10:34:11.2061436Z MAGMA_HOME=/opt/rocm/magma
2025-12-04T10:34:11.2061654Z GITHUB_ENV=/home/runner/_work/_temp/_runner_file_commands/set_env_c2a8d1b0-3c11-4303-ae05-bde093f6a837
2025-12-04T10:34:11.2061940Z GITHUB_EVENT_PATH=/home/runner/_work/_temp/_github_workflow/event.json
2025-12-04T10:34:11.2062125Z GITHUB_EVENT_NAME=schedule
2025-12-04T10:34:11.2062312Z GITHUB_ACTIONS_RUNNER_EXTRA_USER_AGENT=actions-runner-controller/0.12.1
2025-12-04T10:34:11.2062505Z DASHBOARD_TAG=
2025-12-04T10:34:11.2062666Z GITHUB_RUN_ID=19922849170
2025-12-04T10:34:11.2062904Z GITHUB_STEP_SUMMARY=/home/runner/_work/_temp/_runner_file_commands/step_summary_c2a8d1b0-3c11-4303-ae05-bde093f6a837
2025-12-04T10:34:11.2063164Z GITHUB_ACTOR=pytorchmergebot
2025-12-04T10:34:11.2063295Z PR_NUMBER=
2025-12-04T10:34:11.2063403Z GITHUB_RUN_ATTEMPT=1
2025-12-04T10:34:11.2063532Z ANACONDA_PYTHON_VERSION=3.10
2025-12-04T10:34:11.2063688Z GITHUB_GRAPHQL_URL=https://api.github.com/graphql
2025-12-04T10:34:11.2063854Z TERM=vt100
2025-12-04T10:34:11.2063961Z INSTALLED_VISION=yes
2025-12-04T10:34:11.2064078Z BRANCH=main
2025-12-04T10:34:11.2064199Z OPENSSL_ROOT_DIR=/opt/openssl
2025-12-04T10:34:11.2064324Z TESTS_TO_INCLUDE=
2025-12-04T10:34:11.2064513Z GITHUB_ACTION_PATH=/home/runner/_work/pytorch/pytorch/./.github/actions/setup-rocm
2025-12-04T10:34:11.2064731Z GITHUB_SERVER_URL=https://github.com
2025-12-04T10:34:11.2064889Z PYTORCH_ROCM_ARCH=gfx90a;gfx942;gfx950;gfx1100
2025-12-04T10:34:11.2065066Z UCC_COMMIT=9f4b242cbbd8b1462cbc732eb29316cdfa124b77
2025-12-04T10:34:11.2065227Z REENABLED_ISSUES=
2025-12-04T10:34:11.2065335Z SHLVL=1
2025-12-04T10:34:11.2065435Z MAX_JOBS=126
2025-12-04T10:34:11.2065586Z RUNNER_TEST_RESULTS_DIR=/home/runner/_work/_temp/test-results
2025-12-04T10:34:11.2065763Z GITHUB_ACTOR_ID=97764156
2025-12-04T10:34:11.2065896Z RUNNER_TOOL_CACHE=/home/runner/_work/_tool
2025-12-04T10:34:11.2066075Z GITHUB_WORKFLOW_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T10:34:11.2066249Z GITHUB_REF_NAME=main
2025-12-04T10:34:11.2066371Z ROCM_PATH=/opt/rocm
2025-12-04T10:34:11.2066490Z GITHUB_JOB=test
2025-12-04T10:34:11.2066604Z NO_TEST_TIMEOUT=False
2025-12-04T10:34:11.2066732Z GITHUB_REPOSITORY=pytorch/pytorch
2025-12-04T10:34:11.2066870Z LC_ALL=C.UTF-8
2025-12-04T10:34:11.2066982Z GITHUB_RETENTION_DAYS=90
2025-12-04T10:34:11.2067123Z RUNNER_WORKSPACE=/home/runner/_work/pytorch
2025-12-04T10:34:11.2067276Z OPENSSL_DIR=/opt/openssl
2025-12-04T10:34:11.2067403Z GITHUB_ACTION_REPOSITORY=
2025-12-04T10:34:11.2067827Z PATH=/opt/cache/bin:/opt/rocm/llvm/bin:/opt/rocm/opencl/bin:/opt/rocm/hip/bin:/opt/rocm/hcc/bin:/opt/rocm/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
2025-12-04T10:34:11.2068208Z GITHUB_BASE_REF=
2025-12-04T10:34:11.2068306Z CI=true
2025-12-04T10:34:11.2068403Z GITHUB_REPOSITORY_OWNER=pytorch
2025-12-04T10:34:11.2068521Z JOB_ID=57116213187
2025-12-04T10:34:11.2068618Z GITHUB_HEAD_REF=
2025-12-04T10:34:11.2068715Z GITHUB_ACTION_REF=
2025-12-04T10:34:11.2068859Z TEST_SHOWLOCALS=False
2025-12-04T10:34:11.2068978Z GITHUB_WORKFLOW=trunk-rocm-mi300
2025-12-04T10:34:11.2069106Z DEBIAN_FRONTEND=noninteractive
2025-12-04T10:34:11.2069323Z GITHUB_OUTPUT=/home/runner/_work/_temp/_runner_file_commands/set_output_c2a8d1b0-3c11-4303-ae05-bde093f6a837
2025-12-04T10:34:11.2069538Z NO_TD=False
2025-12-04T10:34:11.2069702Z OLDPWD=/var/lib/jenkins
2025-12-04T10:34:11.2069806Z _=/usr/bin/env
2025-12-04T10:34:11.2069950Z ++ python -c 'import site; print(site.getsitepackages()[0])'
2025-12-04T10:34:11.2117750Z + TORCH_INSTALL_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch
2025-12-04T10:34:11.2117998Z + TORCH_BIN_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/bin
2025-12-04T10:34:11.2118216Z + TORCH_LIB_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib
2025-12-04T10:34:11.2118433Z + TORCH_TEST_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/test
2025-12-04T10:34:11.2118602Z + BUILD_DIR=build
2025-12-04T10:34:11.2118710Z + BUILD_RENAMED_DIR=build_renamed
2025-12-04T10:34:11.2118831Z + BUILD_BIN_DIR=build/bin
2025-12-04T10:34:11.2118938Z + SHARD_NUMBER=2
2025-12-04T10:34:11.2120349Z + NUM_TEST_SHARDS=3
2025-12-04T10:34:11.2120539Z + export TORCH_SERIALIZATION_DEBUG=1
2025-12-04T10:34:11.2120698Z + TORCH_SERIALIZATION_DEBUG=1
2025-12-04T10:34:11.2120831Z + export VALGRIND=ON
2025-12-04T10:34:11.2120950Z + VALGRIND=ON
2025-12-04T10:34:11.2121086Z + [[ linux-jammy-rocm-py3.10 == *clang9* ]]
2025-12-04T10:34:11.2121496Z + [[ linux-jammy-rocm-py3.10 == *xpu* ]]
2025-12-04T10:34:11.2121631Z + detect_cuda_arch
2025-12-04T10:34:11.2121754Z + [[ linux-jammy-rocm-py3.10 == *cuda* ]]
2025-12-04T10:34:11.2121904Z + [[ linux-jammy-rocm-py3.10 == *s390x* ]]
2025-12-04T10:34:11.2122038Z + [[ 0 == \1 ]]
2025-12-04T10:34:11.2122143Z + [[ True == \1 ]]
2025-12-04T10:34:11.2122265Z + [[ linux-jammy-rocm-py3.10 != *bazel* ]]
2025-12-04T10:34:11.2122834Z ++ realpath build/custom_test_artifacts
2025-12-04T10:34:11.2128907Z + CUSTOM_TEST_ARTIFACT_BUILD_DIR=/var/lib/jenkins/pytorch/build/custom_test_artifacts
2025-12-04T10:34:11.2129342Z + [[ -n '' ]]
2025-12-04T10:34:11.2129560Z + echo 'Environment variables'
2025-12-04T10:34:11.2129844Z Environment variables
2025-12-04T10:34:11.2130038Z + env
2025-12-04T10:34:11.2134530Z GITHUB_WORKSPACE=/home/runner/_work/pytorch/pytorch
2025-12-04T10:34:11.2134856Z CONTINUE_THROUGH_ERROR=True
2025-12-04T10:34:11.2135118Z BUILD_ENVIRONMENT=linux-jammy-rocm-py3.10
2025-12-04T10:34:11.2135458Z HOSTNAME=linux.rocm.gpu.gfx942.4.b-bphpw-runner-5l4hk
2025-12-04T10:34:11.2135947Z GITHUB_PATH=/home/runner/_work/_temp/_runner_file_commands/add_path_c2a8d1b0-3c11-4303-ae05-bde093f6a837
2025-12-04T10:34:11.2136366Z GITHUB_ACTION=__run_2
2025-12-04T10:34:11.2136591Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1
2025-12-04T10:34:11.2136845Z GITHUB_RUN_NUMBER=689
2025-12-04T10:34:11.2137050Z TEST_CONFIG=distributed
2025-12-04T10:34:11.2137327Z RUNNER_NAME=linux.rocm.gpu.gfx942.4.b-bphpw-runner-5l4hk
2025-12-04T10:34:11.2137644Z GITHUB_REPOSITORY_OWNER_ID=21003710
2025-12-04T10:34:11.2137882Z AWS_DEFAULT_REGION=us-east-1
2025-12-04T10:34:11.2138133Z RUNNER_ARTIFACT_DIR=/home/runner/_work/_temp/artifacts
2025-12-04T10:34:11.2138404Z GITHUB_TRIGGERING_ACTOR=pytorchmergebot
2025-12-04T10:34:11.2138633Z GITHUB_REF_TYPE=branch
2025-12-04T10:34:11.2138876Z BASE_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T10:34:11.2139319Z HUGGING_FACE_HUB_TOKEN=***
2025-12-04T10:34:11.2139635Z ***
2025-12-04T10:34:11.2139806Z GITHUB_REPOSITORY_ID=65600975
2025-12-04T10:34:11.2140007Z GITHUB_ACTIONS=true
2025-12-04T10:34:11.2140221Z SHA1=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T10:34:11.2140495Z GITHUB_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T10:34:11.2140897Z GITHUB_WORKFLOW_REF=pytorch/pytorch/.github/workflows/trunk-rocm-mi300.yml@refs/heads/main
2025-12-04T10:34:11.2141250Z UCC_HOME=/usr
2025-12-04T10:34:11.2141429Z TORCH_SERIALIZATION_DEBUG=1
2025-12-04T10:34:11.2141642Z RUNNER_ENVIRONMENT=self-hosted
2025-12-04T10:34:11.2142176Z VERBOSE_TEST_LOGS=False
2025-12-04T10:34:11.2142374Z GITHUB_REF=refs/heads/main
2025-12-04T10:34:11.2142559Z RUNNER_OS=Linux
2025-12-04T10:34:11.2142726Z SHARD_NUMBER=2
2025-12-04T10:34:11.2142898Z GITHUB_REF_PROTECTED=true
2025-12-04T10:34:11.2143090Z RUNNER_MANUALLY_TRAP_SIG=1
2025-12-04T10:34:11.2143279Z HOME=/var/lib/jenkins
2025-12-04T10:34:11.2143496Z GITHUB_API_URL=https://api.github.com
2025-12-04T10:34:11.2143740Z PYTORCH_TEST_RERUN_DISABLED_TESTS=0
2025-12-04T10:34:11.2143984Z RUNNER_DOCS_DIR=/home/runner/_work/_temp/docs
2025-12-04T10:34:11.2144217Z LANG=C.UTF-8
2025-12-04T10:34:11.2144423Z UCX_COMMIT=29831d319e6be55cb8c768ca61de335c934ca39e
2025-12-04T10:34:11.2144676Z PYTORCH_TEST_WITH_ROCM=1
2025-12-04T10:34:11.2144937Z RUNNER_TRACKING_ID=github_4b208c78-f2ba-477a-8e64-14a9af1f4823
2025-12-04T10:34:11.2145206Z RUNNER_ARCH=X64
2025-12-04T10:34:11.2145385Z RUNNER_TEMP=/home/runner/_work/_temp
2025-12-04T10:34:11.2145601Z NUM_TEST_SHARDS=3
2025-12-04T10:34:11.2145775Z UCX_HOME=/usr
2025-12-04T10:34:11.2146121Z GITHUB_STATE=/home/runner/_work/_temp/_runner_file_commands/save_state_c2a8d1b0-3c11-4303-ae05-bde093f6a837
2025-12-04T10:34:11.2146694Z JOB_NAME=linux-jammy-rocm-py3.10 / test (distributed, 2, 3, linux.rocm.gpu.gfx942.4.b, mem_leak_check, unstable)
2025-12-04T10:34:11.2147088Z MAGMA_HOME=/opt/rocm/magma
2025-12-04T10:34:11.2147436Z GITHUB_ENV=/home/runner/_work/_temp/_runner_file_commands/set_env_c2a8d1b0-3c11-4303-ae05-bde093f6a837
2025-12-04T10:34:11.2147908Z GITHUB_EVENT_PATH=/home/runner/_work/_temp/_github_workflow/event.json
2025-12-04T10:34:11.2148124Z GITHUB_EVENT_NAME=schedule
2025-12-04T10:34:11.2148331Z GITHUB_ACTIONS_RUNNER_EXTRA_USER_AGENT=actions-runner-controller/0.12.1
2025-12-04T10:34:11.2148550Z DASHBOARD_TAG=
2025-12-04T10:34:11.2148685Z GITHUB_RUN_ID=19922849170
2025-12-04T10:34:11.2148961Z GITHUB_STEP_SUMMARY=/home/runner/_work/_temp/_runner_file_commands/step_summary_c2a8d1b0-3c11-4303-ae05-bde093f6a837
2025-12-04T10:34:11.2149272Z GITHUB_ACTOR=pytorchmergebot
2025-12-04T10:34:11.2149424Z PR_NUMBER=
2025-12-04T10:34:11.2149558Z GITHUB_RUN_ATTEMPT=1
2025-12-04T10:34:11.2149744Z VALGRIND=ON
2025-12-04T10:34:11.2149879Z ANACONDA_PYTHON_VERSION=3.10
2025-12-04T10:34:11.2150068Z GITHUB_GRAPHQL_URL=https://api.github.com/graphql
2025-12-04T10:34:11.2150243Z TERM=vt100
2025-12-04T10:34:11.2150375Z INSTALLED_VISION=yes
2025-12-04T10:34:11.2150531Z BRANCH=main
2025-12-04T10:34:11.2150654Z OPENSSL_ROOT_DIR=/opt/openssl
2025-12-04T10:34:11.2150814Z TESTS_TO_INCLUDE=
2025-12-04T10:34:11.2151029Z GITHUB_ACTION_PATH=/home/runner/_work/pytorch/pytorch/./.github/actions/setup-rocm
2025-12-04T10:34:11.2151284Z GITHUB_SERVER_URL=https://github.com
2025-12-04T10:34:11.2151476Z PYTORCH_ROCM_ARCH=gfx90a;gfx942;gfx950;gfx1100
2025-12-04T10:34:11.2151687Z UCC_COMMIT=9f4b242cbbd8b1462cbc732eb29316cdfa124b77
2025-12-04T10:34:11.2151870Z REENABLED_ISSUES=
2025-12-04T10:34:11.2151995Z SHLVL=1
2025-12-04T10:34:11.2152117Z MAX_JOBS=126
2025-12-04T10:34:11.2152284Z RUNNER_TEST_RESULTS_DIR=/home/runner/_work/_temp/test-results
2025-12-04T10:34:11.2152482Z GITHUB_ACTOR_ID=97764156
2025-12-04T10:34:11.2152638Z RUNNER_TOOL_CACHE=/home/runner/_work/_tool
2025-12-04T10:34:11.2152851Z GITHUB_WORKFLOW_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T10:34:11.2153053Z GITHUB_REF_NAME=main
2025-12-04T10:34:11.2153190Z ROCM_PATH=/opt/rocm
2025-12-04T10:34:11.2153312Z GITHUB_JOB=test
2025-12-04T10:34:11.2153443Z NO_TEST_TIMEOUT=False
2025-12-04T10:34:11.2153588Z GITHUB_REPOSITORY=pytorch/pytorch
2025-12-04T10:34:11.2153755Z LC_ALL=C.UTF-8
2025-12-04T10:34:11.2153879Z GITHUB_RETENTION_DAYS=90
2025-12-04T10:34:11.2154046Z RUNNER_WORKSPACE=/home/runner/_work/pytorch
2025-12-04T10:34:11.2154225Z OPENSSL_DIR=/opt/openssl
2025-12-04T10:34:11.2154373Z GITHUB_ACTION_REPOSITORY=
2025-12-04T10:34:11.2154906Z PATH=/opt/cache/bin:/opt/rocm/llvm/bin:/opt/rocm/opencl/bin:/opt/rocm/hip/bin:/opt/rocm/hcc/bin:/opt/rocm/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
2025-12-04T10:34:11.2155396Z GITHUB_BASE_REF=
2025-12-04T10:34:11.2155519Z CI=true
2025-12-04T10:34:11.2155649Z GITHUB_REPOSITORY_OWNER=pytorch
2025-12-04T10:34:11.2155801Z JOB_ID=57116213187
2025-12-04T10:34:11.2155925Z GITHUB_HEAD_REF=
2025-12-04T10:34:11.2156056Z GITHUB_ACTION_REF=
2025-12-04T10:34:11.2156181Z TEST_SHOWLOCALS=False
2025-12-04T10:34:11.2156329Z GITHUB_WORKFLOW=trunk-rocm-mi300
2025-12-04T10:34:11.2156487Z DEBIAN_FRONTEND=noninteractive
2025-12-04T10:34:11.2156768Z GITHUB_OUTPUT=/home/runner/_work/_temp/_runner_file_commands/set_output_c2a8d1b0-3c11-4303-ae05-bde093f6a837
2025-12-04T10:34:11.2157044Z NO_TD=False
2025-12-04T10:34:11.2157160Z OLDPWD=/var/lib/jenkins
2025-12-04T10:34:11.2157302Z _=/usr/bin/env
2025-12-04T10:34:11.2157434Z + echo 'Testing pytorch'
2025-12-04T10:34:11.2157570Z Testing pytorch
2025-12-04T10:34:11.2157755Z + export LANG=C.UTF-8
2025-12-04T10:34:11.2157883Z + LANG=C.UTF-8
2025-12-04T10:34:11.2157981Z + PR_NUMBER=
2025-12-04T10:34:11.2158095Z + [[ distributed == \d\e\f\a\u\l\t ]]
2025-12-04T10:34:11.2158237Z + [[ distributed == \d\i\s\t\r\i\b\u\t\e\d ]]
2025-12-04T10:34:11.2158386Z + [[ linux-jammy-rocm-py3.10 == *rocm* ]]
2025-12-04T10:34:11.2158531Z + export HIP_VISIBLE_DEVICES=0,1,2,3
2025-12-04T10:34:11.2158670Z + HIP_VISIBLE_DEVICES=0,1,2,3
2025-12-04T10:34:11.2158798Z + [[ distributed == \s\l\o\w ]]
2025-12-04T10:34:11.2158946Z + [[ linux-jammy-rocm-py3.10 == *slow-gradcheck* ]]
2025-12-04T10:34:11.2159105Z + [[ linux-jammy-rocm-py3.10 == *cuda* ]]
2025-12-04T10:34:11.2159321Z + [[ linux-jammy-rocm-py3.10 == *rocm* ]]
2025-12-04T10:34:11.2159472Z + export PYTORCH_TESTING_DEVICE_ONLY_FOR=cuda
2025-12-04T10:34:11.2159676Z + PYTORCH_TESTING_DEVICE_ONLY_FOR=cuda
2025-12-04T10:34:11.2159822Z + [[ distributed == *crossref* ]]
2025-12-04T10:34:11.2159962Z + [[ linux-jammy-rocm-py3.10 == *rocm* ]]
2025-12-04T10:34:11.2160092Z + export VALGRIND=OFF
2025-12-04T10:34:11.2160207Z + VALGRIND=OFF
2025-12-04T10:34:11.2160309Z + rocminfo
2025-12-04T10:34:11.2254702Z [37mROCk module version 6.12.12 is loaded[0m
2025-12-04T10:34:11.2979993Z =====================    
2025-12-04T10:34:11.2980436Z HSA System Attributes    
2025-12-04T10:34:11.2980727Z =====================    
2025-12-04T10:34:11.2981011Z Runtime Version:         1.18
2025-12-04T10:34:11.2981315Z Runtime Ext Version:     1.14
2025-12-04T10:34:11.2981648Z System Timestamp Freq.:  1000.000000MHz
2025-12-04T10:34:11.2982191Z Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
2025-12-04T10:34:11.2982774Z Machine Model:           LARGE                              
2025-12-04T10:34:11.2983234Z System Endianness:       LITTLE                             
2025-12-04T10:34:11.2983629Z Mwaitx:                  DISABLED
2025-12-04T10:34:11.2983954Z XNACK enabled:           NO
2025-12-04T10:34:11.2984261Z DMAbuf Support:          YES
2025-12-04T10:34:11.2984557Z VMM Support:             YES
2025-12-04T10:34:11.2984775Z 
2025-12-04T10:34:11.2984881Z ==========               
2025-12-04T10:34:11.2985170Z HSA Agents               
2025-12-04T10:34:11.2985440Z ==========               
2025-12-04T10:34:11.2985723Z *******                  
2025-12-04T10:34:11.2985994Z Agent 1                  
2025-12-04T10:34:11.2986275Z *******                  
2025-12-04T10:34:11.2986605Z   Name:                    AMD EPYC 9575F 64-Core Processor   
2025-12-04T10:34:11.2987032Z   Uuid:                    CPU-XX                             
2025-12-04T10:34:11.2987458Z   Marketing Name:          AMD EPYC 9575F 64-Core Processor   
2025-12-04T10:34:11.2987916Z   Vendor Name:             CPU                                
2025-12-04T10:34:11.2988359Z   Feature:                 None specified                     
2025-12-04T10:34:11.2988768Z   Profile:                 FULL_PROFILE                       
2025-12-04T10:34:11.2989095Z   Float Round Mode:        NEAR                               
2025-12-04T10:34:11.2989304Z   Max Queue Number:        0(0x0)                             
2025-12-04T10:34:11.2989740Z   Queue Min Size:          0(0x0)                             
2025-12-04T10:34:11.2989940Z   Queue Max Size:          0(0x0)                             
2025-12-04T10:34:11.2990131Z   Queue Type:              MULTI                              
2025-12-04T10:34:11.2990318Z   Node:                    0                                  
2025-12-04T10:34:11.2990527Z   Device Type:             CPU                                
2025-12-04T10:34:11.2990731Z   Cache Info:              
2025-12-04T10:34:11.2990889Z     L1:                      49152(0xc000) KB                   
2025-12-04T10:34:11.2991075Z   Chip ID:                 0(0x0)                             
2025-12-04T10:34:11.2991265Z   ASIC Revision:           0(0x0)                             
2025-12-04T10:34:11.2991484Z   Cacheline Size:          64(0x40)                           
2025-12-04T10:34:11.2991684Z   Max Clock Freq. (MHz):   3300                               
2025-12-04T10:34:11.2991886Z   BDFID:                   0                                  
2025-12-04T10:34:11.2992085Z   Internal Node ID:        0                                  
2025-12-04T10:34:11.2992291Z   Compute Unit:            64                                 
2025-12-04T10:34:11.2992493Z   SIMDs per CU:            0                                  
2025-12-04T10:34:11.2992714Z   Shader Engines:          0                                  
2025-12-04T10:34:11.2992916Z   Shader Arrs. per Eng.:   0                                  
2025-12-04T10:34:11.2993182Z   WatchPts on Addr. Ranges:1                                  
2025-12-04T10:34:11.2993401Z   Memory Properties:       
2025-12-04T10:34:11.2993544Z   Features:                None
2025-12-04T10:34:11.2993696Z   Pool Info:               
2025-12-04T10:34:11.2993835Z     Pool 1                   
2025-12-04T10:34:11.2994008Z       Segment:                 GLOBAL; FLAGS: FINE GRAINED        
2025-12-04T10:34:11.2994208Z       Size:                    1584776988(0x5e75c71c) KB          
2025-12-04T10:34:11.2994407Z       Allocatable:             TRUE                               
2025-12-04T10:34:11.2994609Z       Alloc Granule:           4KB                                
2025-12-04T10:34:11.2994822Z       Alloc Recommended Granule:4KB                                
2025-12-04T10:34:11.2995046Z       Alloc Alignment:         4KB                                
2025-12-04T10:34:11.2995257Z       Accessible by all:       TRUE                               
2025-12-04T10:34:11.2995435Z     Pool 2                   
2025-12-04T10:34:11.2995617Z       Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
2025-12-04T10:34:11.2995813Z       Size:                    1584776988(0x5e75c71c) KB          
2025-12-04T10:34:11.2996007Z       Allocatable:             TRUE                               
2025-12-04T10:34:11.2996217Z       Alloc Granule:           4KB                                
2025-12-04T10:34:11.2996442Z       Alloc Recommended Granule:4KB                                
2025-12-04T10:34:11.2996655Z       Alloc Alignment:         4KB                                
2025-12-04T10:34:11.2996861Z       Accessible by all:       TRUE                               
2025-12-04T10:34:11.2997041Z     Pool 3                   
2025-12-04T10:34:11.2997223Z       Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
2025-12-04T10:34:11.2997410Z       Size:                    1584776988(0x5e75c71c) KB          
2025-12-04T10:34:11.2997602Z       Allocatable:             TRUE                               
2025-12-04T10:34:11.2997818Z       Alloc Granule:           4KB                                
2025-12-04T10:34:11.2998032Z       Alloc Recommended Granule:4KB                                
2025-12-04T10:34:11.2998293Z       Alloc Alignment:         4KB                                
2025-12-04T10:34:11.2998496Z       Accessible by all:       TRUE                               
2025-12-04T10:34:11.2998673Z     Pool 4                   
2025-12-04T10:34:11.2998908Z       Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
2025-12-04T10:34:11.2999101Z       Size:                    1584776988(0x5e75c71c) KB          
2025-12-04T10:34:11.2999294Z       Allocatable:             TRUE                               
2025-12-04T10:34:11.2999547Z       Alloc Granule:           4KB                                
2025-12-04T10:34:11.2999766Z       Alloc Recommended Granule:4KB                                
2025-12-04T10:34:11.2999941Z       Alloc Alignment:         4KB                                
2025-12-04T10:34:11.3000113Z       Accessible by all:       TRUE                               
2025-12-04T10:34:11.3000263Z   ISA Info:                
2025-12-04T10:34:11.3000398Z *******                  
2025-12-04T10:34:11.3000521Z Agent 2                  
2025-12-04T10:34:11.3000630Z *******                  
2025-12-04T10:34:11.3000759Z   Name:                    AMD EPYC 9575F 64-Core Processor   
2025-12-04T10:34:11.3000914Z   Uuid:                    CPU-XX                             
2025-12-04T10:34:11.3001097Z   Marketing Name:          AMD EPYC 9575F 64-Core Processor   
2025-12-04T10:34:11.3001265Z   Vendor Name:             CPU                                
2025-12-04T10:34:11.3001438Z   Feature:                 None specified                     
2025-12-04T10:34:11.3001600Z   Profile:                 FULL_PROFILE                       
2025-12-04T10:34:11.3001772Z   Float Round Mode:        NEAR                               
2025-12-04T10:34:11.3001946Z   Max Queue Number:        0(0x0)                             
2025-12-04T10:34:11.3002164Z   Queue Min Size:          0(0x0)                             
2025-12-04T10:34:11.3002326Z   Queue Max Size:          0(0x0)                             
2025-12-04T10:34:11.3002486Z   Queue Type:              MULTI                              
2025-12-04T10:34:11.3002675Z   Node:                    1                                  
2025-12-04T10:34:11.3002829Z   Device Type:             CPU                                
2025-12-04T10:34:11.3003086Z   Cache Info:              
2025-12-04T10:34:11.3003214Z     L1:                      49152(0xc000) KB                   
2025-12-04T10:34:11.3003361Z   Chip ID:                 0(0x0)                             
2025-12-04T10:34:11.3003531Z   ASIC Revision:           0(0x0)                             
2025-12-04T10:34:11.3003696Z   Cacheline Size:          64(0x40)                           
2025-12-04T10:34:11.3003869Z   Max Clock Freq. (MHz):   3300                               
2025-12-04T10:34:11.3004034Z   BDFID:                   0                                  
2025-12-04T10:34:11.3004191Z   Internal Node ID:        1                                  
2025-12-04T10:34:11.3004356Z   Compute Unit:            64                                 
2025-12-04T10:34:11.3004515Z   SIMDs per CU:            0                                  
2025-12-04T10:34:11.3004674Z   Shader Engines:          0                                  
2025-12-04T10:34:11.3004840Z   Shader Arrs. per Eng.:   0                                  
2025-12-04T10:34:11.3005008Z   WatchPts on Addr. Ranges:1                                  
2025-12-04T10:34:11.3005158Z   Memory Properties:       
2025-12-04T10:34:11.3005278Z   Features:                None
2025-12-04T10:34:11.3005434Z   Pool Info:               
2025-12-04T10:34:11.3005578Z     Pool 1                   
2025-12-04T10:34:11.3005732Z       Segment:                 GLOBAL; FLAGS: FINE GRAINED        
2025-12-04T10:34:11.3005899Z       Size:                    1585311804(0x5e7df03c) KB          
2025-12-04T10:34:11.3006058Z       Allocatable:             TRUE                               
2025-12-04T10:34:11.3006228Z       Alloc Granule:           4KB                                
2025-12-04T10:34:11.3006409Z       Alloc Recommended Granule:4KB                                
2025-12-04T10:34:11.3006584Z       Alloc Alignment:         4KB                                
2025-12-04T10:34:11.3006752Z       Accessible by all:       TRUE                               
2025-12-04T10:34:11.3006947Z     Pool 2                   
2025-12-04T10:34:11.3007097Z       Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
2025-12-04T10:34:11.3007257Z       Size:                    1585311804(0x5e7df03c) KB          
2025-12-04T10:34:11.3007452Z       Allocatable:             TRUE                               
2025-12-04T10:34:11.3007621Z       Alloc Granule:           4KB                                
2025-12-04T10:34:11.3007814Z       Alloc Recommended Granule:4KB                                
2025-12-04T10:34:11.3008004Z       Alloc Alignment:         4KB                                
2025-12-04T10:34:11.3008182Z       Accessible by all:       TRUE                               
2025-12-04T10:34:11.3008355Z     Pool 3                   
2025-12-04T10:34:11.3008489Z       Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
2025-12-04T10:34:11.3008644Z       Size:                    1585311804(0x5e7df03c) KB          
2025-12-04T10:34:11.3008802Z       Allocatable:             TRUE                               
2025-12-04T10:34:11.3008963Z       Alloc Granule:           4KB                                
2025-12-04T10:34:11.3009129Z       Alloc Recommended Granule:4KB                                
2025-12-04T10:34:11.3009295Z       Alloc Alignment:         4KB                                
2025-12-04T10:34:11.3009458Z       Accessible by all:       TRUE                               
2025-12-04T10:34:11.3009647Z     Pool 4                   
2025-12-04T10:34:11.3009821Z       Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
2025-12-04T10:34:11.3009976Z       Size:                    1585311804(0x5e7df03c) KB          
2025-12-04T10:34:11.3010130Z       Allocatable:             TRUE                               
2025-12-04T10:34:11.3010292Z       Alloc Granule:           4KB                                
2025-12-04T10:34:11.3010482Z       Alloc Recommended Granule:4KB                                
2025-12-04T10:34:11.3010671Z       Alloc Alignment:         4KB                                
2025-12-04T10:34:11.3010844Z       Accessible by all:       TRUE                               
2025-12-04T10:34:11.3010991Z   ISA Info:                
2025-12-04T10:34:11.3011224Z *******                  
2025-12-04T10:34:11.3011327Z Agent 3                  
2025-12-04T10:34:11.3011471Z *******                  
2025-12-04T10:34:11.3011622Z   Name:                    gfx942                             
2025-12-04T10:34:11.3011780Z   Uuid:                    GPU-41f9686c3d70a95c               
2025-12-04T10:34:11.3011938Z   Marketing Name:                                             
2025-12-04T10:34:11.3012145Z   Vendor Name:             AMD                                
2025-12-04T10:34:11.3012304Z   Feature:                 KERNEL_DISPATCH                    
2025-12-04T10:34:11.3012486Z   Profile:                 BASE_PROFILE                       
2025-12-04T10:34:11.3012668Z   Float Round Mode:        NEAR                               
2025-12-04T10:34:11.3012833Z   Max Queue Number:        128(0x80)                          
2025-12-04T10:34:11.3013003Z   Queue Min Size:          64(0x40)                           
2025-12-04T10:34:11.3013200Z   Queue Max Size:          131072(0x20000)                    
2025-12-04T10:34:11.3013355Z   Queue Type:              MULTI                              
2025-12-04T10:34:11.3013505Z   Node:                    2                                  
2025-12-04T10:34:11.3013660Z   Device Type:             GPU                                
2025-12-04T10:34:11.3013804Z   Cache Info:              
2025-12-04T10:34:11.3013928Z     L1:                      32(0x20) KB                        
2025-12-04T10:34:11.3014068Z     L2:                      4096(0x1000) KB                    
2025-12-04T10:34:11.3014208Z     L3:                      262144(0x40000) KB                 
2025-12-04T10:34:11.3014353Z   Chip ID:                 29861(0x74a5)                      
2025-12-04T10:34:11.3014549Z   ASIC Revision:           1(0x1)                             
2025-12-04T10:34:11.3014711Z   Cacheline Size:          128(0x80)                          
2025-12-04T10:34:11.3014908Z   Max Clock Freq. (MHz):   2100                               
2025-12-04T10:34:11.3015085Z   BDFID:                   29952                              
2025-12-04T10:34:11.3015241Z   Internal Node ID:        2                                  
2025-12-04T10:34:11.3015399Z   Compute Unit:            304                                
2025-12-04T10:34:11.3015568Z   SIMDs per CU:            4                                  
2025-12-04T10:34:11.3015771Z   Shader Engines:          32                                 
2025-12-04T10:34:11.3015934Z   Shader Arrs. per Eng.:   1                                  
2025-12-04T10:34:11.3016102Z   WatchPts on Addr. Ranges:4                                  
2025-12-04T10:34:11.3016281Z   Coherent Host Access:    FALSE                              
2025-12-04T10:34:11.3016428Z   Memory Properties:       
2025-12-04T10:34:11.3016559Z   Features:                KERNEL_DISPATCH 
2025-12-04T10:34:11.3016743Z   Fast F16 Operation:      TRUE                               
2025-12-04T10:34:11.3016913Z   Wavefront Size:          64(0x40)                           
2025-12-04T10:34:11.3017077Z   Workgroup Max Size:      1024(0x400)                        
2025-12-04T10:34:11.3017223Z   Workgroup Max Size per Dimension:
2025-12-04T10:34:11.3017352Z     x                        1024(0x400)                        
2025-12-04T10:34:11.3017521Z     y                        1024(0x400)                        
2025-12-04T10:34:11.3017648Z     z                        1024(0x400)                        
2025-12-04T10:34:11.3017790Z   Max Waves Per CU:        32(0x20)                           
2025-12-04T10:34:11.3017947Z   Max Work-item Per CU:    2048(0x800)                        
2025-12-04T10:34:11.3018106Z   Grid Max Size:           4294967295(0xffffffff)             
2025-12-04T10:34:11.3018253Z   Grid Max Size per Dimension:
2025-12-04T10:34:11.3018373Z     x                        2147483647(0x7fffffff)             
2025-12-04T10:34:11.3030383Z     y                        65535(0xffff)                      
2025-12-04T10:34:11.3030536Z     z                        65535(0xffff)                      
2025-12-04T10:34:11.3030701Z   Max fbarriers/Workgrp:   32                                 
2025-12-04T10:34:11.3030939Z   Packet Processor uCode:: 185                                
2025-12-04T10:34:11.3031122Z   SDMA engine uCode::      24                                 
2025-12-04T10:34:11.3031292Z   IOMMU Support::          None                               
2025-12-04T10:34:11.3031445Z   Pool Info:               
2025-12-04T10:34:11.3031568Z     Pool 1                   
2025-12-04T10:34:11.3031718Z       Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
2025-12-04T10:34:11.3031890Z       Size:                    268419072(0xfffc000) KB            
2025-12-04T10:34:11.3032056Z       Allocatable:             TRUE                               
2025-12-04T10:34:11.3032230Z       Alloc Granule:           4KB                                
2025-12-04T10:34:11.3032410Z       Alloc Recommended Granule:2048KB                             
2025-12-04T10:34:11.3032589Z       Alloc Alignment:         4KB                                
2025-12-04T10:34:11.3032763Z       Accessible by all:       FALSE                              
2025-12-04T10:34:11.3032916Z     Pool 2                   
2025-12-04T10:34:11.3033066Z       Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
2025-12-04T10:34:11.3033229Z       Size:                    268419072(0xfffc000) KB            
2025-12-04T10:34:11.3033388Z       Allocatable:             TRUE                               
2025-12-04T10:34:11.3033556Z       Alloc Granule:           4KB                                
2025-12-04T10:34:11.3033729Z       Alloc Recommended Granule:2048KB                             
2025-12-04T10:34:11.3033971Z       Alloc Alignment:         4KB                                
2025-12-04T10:34:11.3034145Z       Accessible by all:       FALSE                              
2025-12-04T10:34:11.3034294Z     Pool 3                   
2025-12-04T10:34:11.3034434Z       Segment:                 GLOBAL; FLAGS: FINE GRAINED        
2025-12-04T10:34:11.3034592Z       Size:                    268419072(0xfffc000) KB            
2025-12-04T10:34:11.3034751Z       Allocatable:             TRUE                               
2025-12-04T10:34:11.3034922Z       Alloc Granule:           4KB                                
2025-12-04T10:34:11.3035098Z       Alloc Recommended Granule:2048KB                             
2025-12-04T10:34:11.3035271Z       Alloc Alignment:         4KB                                
2025-12-04T10:34:11.3035441Z       Accessible by all:       FALSE                              
2025-12-04T10:34:11.3035590Z     Pool 4                   
2025-12-04T10:34:11.3035725Z       Segment:                 GROUP                              
2025-12-04T10:34:11.3035886Z       Size:                    64(0x40) KB                        
2025-12-04T10:34:11.3036043Z       Allocatable:             FALSE                              
2025-12-04T10:34:11.3036204Z       Alloc Granule:           0KB                                
2025-12-04T10:34:11.3036379Z       Alloc Recommended Granule:0KB                                
2025-12-04T10:34:11.3036552Z       Alloc Alignment:         0KB                                
2025-12-04T10:34:11.3036764Z       Accessible by all:       FALSE                              
2025-12-04T10:34:11.3036915Z   ISA Info:                
2025-12-04T10:34:11.3037031Z     ISA 1                    
2025-12-04T10:34:11.3037176Z       Name:                    amdgcn-amd-amdhsa--gfx942:sramecc+:xnack-
2025-12-04T10:34:11.3037358Z       Machine Models:          HSA_MACHINE_MODEL_LARGE            
2025-12-04T10:34:11.3037535Z       Profiles:                HSA_PROFILE_BASE                   
2025-12-04T10:34:11.3037715Z       Default Rounding Mode:   NEAR                               
2025-12-04T10:34:11.3037893Z       Default Rounding Mode:   NEAR                               
2025-12-04T10:34:11.3038060Z       Fast f16:                TRUE                               
2025-12-04T10:34:11.3038227Z       Workgroup Max Size:      1024(0x400)                        
2025-12-04T10:34:11.3038390Z       Workgroup Max Size per Dimension:
2025-12-04T10:34:11.3038537Z         x                        1024(0x400)                        
2025-12-04T10:34:11.3038690Z         y                        1024(0x400)                        
2025-12-04T10:34:11.3038832Z         z                        1024(0x400)                        
2025-12-04T10:34:11.3038986Z       Grid Max Size:           4294967295(0xffffffff)             
2025-12-04T10:34:11.3039140Z       Grid Max Size per Dimension:
2025-12-04T10:34:11.3039276Z         x                        2147483647(0x7fffffff)             
2025-12-04T10:34:11.3039428Z         y                        65535(0xffff)                      
2025-12-04T10:34:11.3039614Z         z                        65535(0xffff)                      
2025-12-04T10:34:11.3039771Z       FBarrier Max Size:       32                                 
2025-12-04T10:34:11.3039920Z     ISA 2                    
2025-12-04T10:34:11.3040074Z       Name:                    amdgcn-amd-amdhsa--gfx9-4-generic:sramecc+:xnack-
2025-12-04T10:34:11.3040263Z       Machine Models:          HSA_MACHINE_MODEL_LARGE            
2025-12-04T10:34:11.3040441Z       Profiles:                HSA_PROFILE_BASE                   
2025-12-04T10:34:11.3040615Z       Default Rounding Mode:   NEAR                               
2025-12-04T10:34:11.3040796Z       Default Rounding Mode:   NEAR                               
2025-12-04T10:34:11.3040962Z       Fast f16:                TRUE                               
2025-12-04T10:34:11.3041131Z       Workgroup Max Size:      1024(0x400)                        
2025-12-04T10:34:11.3041329Z       Workgroup Max Size per Dimension:
2025-12-04T10:34:11.3041469Z         x                        1024(0x400)                        
2025-12-04T10:34:11.3041609Z         y                        1024(0x400)                        
2025-12-04T10:34:11.3041748Z         z                        1024(0x400)                        
2025-12-04T10:34:11.3041901Z       Grid Max Size:           4294967295(0xffffffff)             
2025-12-04T10:34:11.3042052Z       Grid Max Size per Dimension:
2025-12-04T10:34:11.3042187Z         x                        2147483647(0x7fffffff)             
2025-12-04T10:34:11.3042330Z         y                        65535(0xffff)                      
2025-12-04T10:34:11.3042471Z         z                        65535(0xffff)                      
2025-12-04T10:34:11.3042628Z       FBarrier Max Size:       32                                 
2025-12-04T10:34:11.3042775Z *******                  
2025-12-04T10:34:11.3042888Z Agent 4                  
2025-12-04T10:34:11.3042999Z *******                  
2025-12-04T10:34:11.3043133Z   Name:                    gfx942                             
2025-12-04T10:34:11.3043293Z   Uuid:                    GPU-e2954cd4b2ef3669               
2025-12-04T10:34:11.3043458Z   Marketing Name:                                             
2025-12-04T10:34:11.3043626Z   Vendor Name:             AMD                                
2025-12-04T10:34:11.3043793Z   Feature:                 KERNEL_DISPATCH                    
2025-12-04T10:34:11.3043959Z   Profile:                 BASE_PROFILE                       
2025-12-04T10:34:11.3044167Z   Float Round Mode:        NEAR                               
2025-12-04T10:34:11.3044334Z   Max Queue Number:        128(0x80)                          
2025-12-04T10:34:11.3044498Z   Queue Min Size:          64(0x40)                           
2025-12-04T10:34:11.3044657Z   Queue Max Size:          131072(0x20000)                    
2025-12-04T10:34:11.3044811Z   Queue Type:              MULTI                              
2025-12-04T10:34:11.3044971Z   Node:                    3                                  
2025-12-04T10:34:11.3045119Z   Device Type:             GPU                                
2025-12-04T10:34:11.3045261Z   Cache Info:              
2025-12-04T10:34:11.3045386Z     L1:                      32(0x20) KB                        
2025-12-04T10:34:11.3045526Z     L2:                      4096(0x1000) KB                    
2025-12-04T10:34:11.3045664Z     L3:                      262144(0x40000) KB                 
2025-12-04T10:34:11.3045811Z   Chip ID:                 29861(0x74a5)                      
2025-12-04T10:34:11.3045972Z   ASIC Revision:           1(0x1)                             
2025-12-04T10:34:11.3046133Z   Cacheline Size:          128(0x80)                          
2025-12-04T10:34:11.3046297Z   Max Clock Freq. (MHz):   2100                               
2025-12-04T10:34:11.3046449Z   BDFID:                   1280                               
2025-12-04T10:34:11.3046609Z   Internal Node ID:        3                                  
2025-12-04T10:34:11.3046765Z   Compute Unit:            304                                
2025-12-04T10:34:11.3046922Z   SIMDs per CU:            4                                  
2025-12-04T10:34:11.3047079Z   Shader Engines:          32                                 
2025-12-04T10:34:11.3047242Z   Shader Arrs. per Eng.:   1                                  
2025-12-04T10:34:11.3047408Z   WatchPts on Addr. Ranges:4                                  
2025-12-04T10:34:11.3047580Z   Coherent Host Access:    FALSE                              
2025-12-04T10:34:11.3047729Z   Memory Properties:       
2025-12-04T10:34:11.3047854Z   Features:                KERNEL_DISPATCH 
2025-12-04T10:34:11.3048004Z   Fast F16 Operation:      TRUE                               
2025-12-04T10:34:11.3048170Z   Wavefront Size:          64(0x40)                           
2025-12-04T10:34:11.3048335Z   Workgroup Max Size:      1024(0x400)                        
2025-12-04T10:34:11.3048527Z   Workgroup Max Size per Dimension:
2025-12-04T10:34:11.3048660Z     x                        1024(0x400)                        
2025-12-04T10:34:11.3048795Z     y                        1024(0x400)                        
2025-12-04T10:34:11.3048928Z     z                        1024(0x400)                        
2025-12-04T10:34:11.3049077Z   Max Waves Per CU:        32(0x20)                           
2025-12-04T10:34:11.3049240Z   Max Work-item Per CU:    2048(0x800)                        
2025-12-04T10:34:11.3049407Z   Grid Max Size:           4294967295(0xffffffff)             
2025-12-04T10:34:11.3049554Z   Grid Max Size per Dimension:
2025-12-04T10:34:11.3049719Z     x                        2147483647(0x7fffffff)             
2025-12-04T10:34:11.3049856Z     y                        65535(0xffff)                      
2025-12-04T10:34:11.3049992Z     z                        65535(0xffff)                      
2025-12-04T10:34:11.3050151Z   Max fbarriers/Workgrp:   32                                 
2025-12-04T10:34:11.3050321Z   Packet Processor uCode:: 185                                
2025-12-04T10:34:11.3050482Z   SDMA engine uCode::      24                                 
2025-12-04T10:34:11.3050636Z   IOMMU Support::          None                               
2025-12-04T10:34:11.3050768Z   Pool Info:               
2025-12-04T10:34:11.3050871Z     Pool 1                   
2025-12-04T10:34:11.3051003Z       Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
2025-12-04T10:34:11.3051198Z       Size:                    268419072(0xfffc000) KB            
2025-12-04T10:34:11.3051358Z       Allocatable:             TRUE                               
2025-12-04T10:34:11.3051526Z       Alloc Granule:           4KB                                
2025-12-04T10:34:11.3051697Z       Alloc Recommended Granule:2048KB                             
2025-12-04T10:34:11.3051868Z       Alloc Alignment:         4KB                                
2025-12-04T10:34:11.3052043Z       Accessible by all:       FALSE                              
2025-12-04T10:34:11.3052190Z     Pool 2                   
2025-12-04T10:34:11.3052330Z       Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
2025-12-04T10:34:11.3052488Z       Size:                    268419072(0xfffc000) KB            
2025-12-04T10:34:11.3052644Z       Allocatable:             TRUE                               
2025-12-04T10:34:11.3052810Z       Alloc Granule:           4KB                                
2025-12-04T10:34:11.3052986Z       Alloc Recommended Granule:2048KB                             
2025-12-04T10:34:11.3053157Z       Alloc Alignment:         4KB                                
2025-12-04T10:34:11.3053325Z       Accessible by all:       FALSE                              
2025-12-04T10:34:11.3053469Z     Pool 3                   
2025-12-04T10:34:11.3053606Z       Segment:                 GLOBAL; FLAGS: FINE GRAINED        
2025-12-04T10:34:11.3053764Z       Size:                    268419072(0xfffc000) KB            
2025-12-04T10:34:11.3053919Z       Allocatable:             TRUE                               
2025-12-04T10:34:11.3054076Z       Alloc Granule:           4KB                                
2025-12-04T10:34:11.3054239Z       Alloc Recommended Granule:2048KB                             
2025-12-04T10:34:11.3054401Z       Alloc Alignment:         4KB                                
2025-12-04T10:34:11.3054563Z       Accessible by all:       FALSE                              
2025-12-04T10:34:11.3054710Z     Pool 4                   
2025-12-04T10:34:11.3054837Z       Segment:                 GROUP                              
2025-12-04T10:34:11.3054982Z       Size:                    64(0x40) KB                        
2025-12-04T10:34:11.3055127Z       Allocatable:             FALSE                              
2025-12-04T10:34:11.3055285Z       Alloc Granule:           0KB                                
2025-12-04T10:34:11.3055497Z       Alloc Recommended Granule:0KB                                
2025-12-04T10:34:11.3055658Z       Alloc Alignment:         0KB                                
2025-12-04T10:34:11.3055816Z       Accessible by all:       FALSE                              
2025-12-04T10:34:11.3055955Z   ISA Info:                
2025-12-04T10:34:11.3056060Z     ISA 1                    
2025-12-04T10:34:11.3056193Z       Name:                    amdgcn-amd-amdhsa--gfx942:sramecc+:xnack-
2025-12-04T10:34:11.3056357Z       Machine Models:          HSA_MACHINE_MODEL_LARGE            
2025-12-04T10:34:11.3056525Z       Profiles:                HSA_PROFILE_BASE                   
2025-12-04T10:34:11.3056687Z       Default Rounding Mode:   NEAR                               
2025-12-04T10:34:11.3056854Z       Default Rounding Mode:   NEAR                               
2025-12-04T10:34:11.3057010Z       Fast f16:                TRUE                               
2025-12-04T10:34:11.3057163Z       Workgroup Max Size:      1024(0x400)                        
2025-12-04T10:34:11.3057312Z       Workgroup Max Size per Dimension:
2025-12-04T10:34:11.3057440Z         x                        1024(0x400)                        
2025-12-04T10:34:11.3057571Z         y                        1024(0x400)                        
2025-12-04T10:34:11.3057701Z         z                        1024(0x400)                        
2025-12-04T10:34:11.3057841Z       Grid Max Size:           4294967295(0xffffffff)             
2025-12-04T10:34:11.3057980Z       Grid Max Size per Dimension:
2025-12-04T10:34:11.3058134Z         x                        2147483647(0x7fffffff)             
2025-12-04T10:34:11.3058266Z         y                        65535(0xffff)                      
2025-12-04T10:34:11.3058395Z         z                        65535(0xffff)                      
2025-12-04T10:34:11.3058539Z       FBarrier Max Size:       32                                 
2025-12-04T10:34:11.3058677Z     ISA 2                    
2025-12-04T10:34:11.3058823Z       Name:                    amdgcn-amd-amdhsa--gfx9-4-generic:sramecc+:xnack-
2025-12-04T10:34:11.3058998Z       Machine Models:          HSA_MACHINE_MODEL_LARGE            
2025-12-04T10:34:11.3059161Z       Profiles:                HSA_PROFILE_BASE                   
2025-12-04T10:34:11.3059319Z       Default Rounding Mode:   NEAR                               
2025-12-04T10:34:11.3059483Z       Default Rounding Mode:   NEAR                               
2025-12-04T10:34:11.3059689Z       Fast f16:                TRUE                               
2025-12-04T10:34:11.3059847Z       Workgroup Max Size:      1024(0x400)                        
2025-12-04T10:34:11.3059992Z       Workgroup Max Size per Dimension:
2025-12-04T10:34:11.3060118Z         x                        1024(0x400)                        
2025-12-04T10:34:11.3060246Z         y                        1024(0x400)                        
2025-12-04T10:34:11.3060375Z         z                        1024(0x400)                        
2025-12-04T10:34:11.3060520Z       Grid Max Size:           4294967295(0xffffffff)             
2025-12-04T10:34:11.3060659Z       Grid Max Size per Dimension:
2025-12-04T10:34:11.3060781Z         x                        2147483647(0x7fffffff)             
2025-12-04T10:34:11.3060913Z         y                        65535(0xffff)                      
2025-12-04T10:34:11.3061042Z         z                        65535(0xffff)                      
2025-12-04T10:34:11.3061189Z       FBarrier Max Size:       32                                 
2025-12-04T10:34:11.3061329Z *******                  
2025-12-04T10:34:11.3061428Z Agent 5                  
2025-12-04T10:34:11.3061527Z *******                  
2025-12-04T10:34:11.3061642Z   Name:                    gfx942                             
2025-12-04T10:34:11.3061789Z   Uuid:                    GPU-d34a48edc983a6e7               
2025-12-04T10:34:11.3061943Z   Marketing Name:                                             
2025-12-04T10:34:11.3062096Z   Vendor Name:             AMD                                
2025-12-04T10:34:11.3062287Z   Feature:                 KERNEL_DISPATCH                    
2025-12-04T10:34:11.3062442Z   Profile:                 BASE_PROFILE                       
2025-12-04T10:34:11.3062599Z   Float Round Mode:        NEAR                               
2025-12-04T10:34:11.3062756Z   Max Queue Number:        128(0x80)                          
2025-12-04T10:34:11.3062910Z   Queue Min Size:          64(0x40)                           
2025-12-04T10:34:11.3063059Z   Queue Max Size:          131072(0x20000)                    
2025-12-04T10:34:11.3063208Z   Queue Type:              MULTI                              
2025-12-04T10:34:11.3063347Z   Node:                    4                                  
2025-12-04T10:34:11.3063492Z   Device Type:             GPU                                
2025-12-04T10:34:11.3063626Z   Cache Info:              
2025-12-04T10:34:11.3063742Z     L1:                      32(0x20) KB                        
2025-12-04T10:34:11.3063877Z     L2:                      4096(0x1000) KB                    
2025-12-04T10:34:11.3064009Z     L3:                      262144(0x40000) KB                 
2025-12-04T10:34:11.3064146Z   Chip ID:                 29861(0x74a5)                      
2025-12-04T10:34:11.3064294Z   ASIC Revision:           1(0x1)                             
2025-12-04T10:34:11.3064447Z   Cacheline Size:          128(0x80)                          
2025-12-04T10:34:11.3064601Z   Max Clock Freq. (MHz):   2100                               
2025-12-04T10:34:11.3064781Z   BDFID:                   25856                              
2025-12-04T10:34:11.3064929Z   Internal Node ID:        4                                  
2025-12-04T10:34:11.3065081Z   Compute Unit:            304                                
2025-12-04T10:34:11.3065230Z   SIMDs per CU:            4                                  
2025-12-04T10:34:11.3065383Z   Shader Engines:          32                                 
2025-12-04T10:34:11.3065543Z   Shader Arrs. per Eng.:   1                                  
2025-12-04T10:34:11.3065705Z   WatchPts on Addr. Ranges:4                                  
2025-12-04T10:34:11.3065866Z   Coherent Host Access:    FALSE                              
2025-12-04T10:34:11.3066006Z   Memory Properties:       
2025-12-04T10:34:11.3066123Z   Features:                KERNEL_DISPATCH 
2025-12-04T10:34:11.3066266Z   Fast F16 Operation:      TRUE                               
2025-12-04T10:34:11.3066422Z   Wavefront Size:          64(0x40)                           
2025-12-04T10:34:11.3066586Z   Workgroup Max Size:      1024(0x400)                        
2025-12-04T10:34:11.3066732Z   Workgroup Max Size per Dimension:
2025-12-04T10:34:11.3066856Z     x                        1024(0x400)                        
2025-12-04T10:34:11.3066986Z     y                        1024(0x400)                        
2025-12-04T10:34:11.3067113Z     z                        1024(0x400)                        
2025-12-04T10:34:11.3067256Z   Max Waves Per CU:        32(0x20)                           
2025-12-04T10:34:11.3067414Z   Max Work-item Per CU:    2048(0x800)                        
2025-12-04T10:34:11.3067574Z   Grid Max Size:           4294967295(0xffffffff)             
2025-12-04T10:34:11.3067709Z   Grid Max Size per Dimension:
2025-12-04T10:34:11.3067824Z     x                        2147483647(0x7fffffff)             
2025-12-04T10:34:11.3067954Z     y                        65535(0xffff)                      
2025-12-04T10:34:11.3068088Z     z                        65535(0xffff)                      
2025-12-04T10:34:11.3068234Z   Max fbarriers/Workgrp:   32                                 
2025-12-04T10:34:11.3068399Z   Packet Processor uCode:: 185                                
2025-12-04T10:34:11.3068562Z   SDMA engine uCode::      24                                 
2025-12-04T10:34:11.3068718Z   IOMMU Support::          None                               
2025-12-04T10:34:11.3068852Z   Pool Info:               
2025-12-04T10:34:11.3069046Z     Pool 1                   
2025-12-04T10:34:11.3069182Z       Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
2025-12-04T10:34:11.3069335Z       Size:                    268419072(0xfffc000) KB            
2025-12-04T10:34:11.3069486Z       Allocatable:             TRUE                               
2025-12-04T10:34:11.3069696Z       Alloc Granule:           4KB                                
2025-12-04T10:34:11.3069861Z       Alloc Recommended Granule:2048KB                             
2025-12-04T10:34:11.3070028Z       Alloc Alignment:         4KB                                
2025-12-04T10:34:11.3070189Z       Accessible by all:       FALSE                              
2025-12-04T10:34:11.3070329Z     Pool 2                   
2025-12-04T10:34:11.3070462Z       Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
2025-12-04T10:34:11.3070610Z       Size:                    268419072(0xfffc000) KB            
2025-12-04T10:34:11.3070763Z       Allocatable:             TRUE                               
2025-12-04T10:34:11.3070919Z       Alloc Granule:           4KB                                
2025-12-04T10:34:11.3071083Z       Alloc Recommended Granule:2048KB                             
2025-12-04T10:34:11.3071246Z       Alloc Alignment:         4KB                                
2025-12-04T10:34:11.3071401Z       Accessible by all:       FALSE                              
2025-12-04T10:34:11.3071537Z     Pool 3                   
2025-12-04T10:34:11.3071703Z       Segment:                 GLOBAL; FLAGS: FINE GRAINED        
2025-12-04T10:34:11.3071848Z       Size:                    268419072(0xfffc000) KB            
2025-12-04T10:34:11.3071995Z       Allocatable:             TRUE                               
2025-12-04T10:34:11.3072149Z       Alloc Granule:           4KB                                
2025-12-04T10:34:11.3072313Z       Alloc Recommended Granule:2048KB                             
2025-12-04T10:34:11.3072480Z       Alloc Alignment:         4KB                                
2025-12-04T10:34:11.3072638Z       Accessible by all:       FALSE                              
2025-12-04T10:34:11.3072774Z     Pool 4                   
2025-12-04T10:34:11.3072898Z       Segment:                 GROUP                              
2025-12-04T10:34:11.3073039Z       Size:                    64(0x40) KB                        
2025-12-04T10:34:11.3073185Z       Allocatable:             FALSE                              
2025-12-04T10:34:11.3073346Z       Alloc Granule:           0KB                                
2025-12-04T10:34:11.3073507Z       Alloc Recommended Granule:0KB                                
2025-12-04T10:34:11.3073667Z       Alloc Alignment:         0KB                                
2025-12-04T10:34:11.3073827Z       Accessible by all:       FALSE                              
2025-12-04T10:34:11.3073966Z   ISA Info:                
2025-12-04T10:34:11.3074069Z     ISA 1                    
2025-12-04T10:34:11.3074202Z       Name:                    amdgcn-amd-amdhsa--gfx942:sramecc+:xnack-
2025-12-04T10:34:11.3074365Z       Machine Models:          HSA_MACHINE_MODEL_LARGE            
2025-12-04T10:34:11.3074528Z       Profiles:                HSA_PROFILE_BASE                   
2025-12-04T10:34:11.3074689Z       Default Rounding Mode:   NEAR                               
2025-12-04T10:34:11.3074852Z       Default Rounding Mode:   NEAR                               
2025-12-04T10:34:11.3075002Z       Fast f16:                TRUE                               
2025-12-04T10:34:11.3075161Z       Workgroup Max Size:      1024(0x400)                        
2025-12-04T10:34:11.3075305Z       Workgroup Max Size per Dimension:
2025-12-04T10:34:11.3075436Z         x                        1024(0x400)                        
2025-12-04T10:34:11.3075565Z         y                        1024(0x400)                        
2025-12-04T10:34:11.3075694Z         z                        1024(0x400)                        
2025-12-04T10:34:11.3075874Z       Grid Max Size:           4294967295(0xffffffff)             
2025-12-04T10:34:11.3076014Z       Grid Max Size per Dimension:
2025-12-04T10:34:11.3076137Z         x                        2147483647(0x7fffffff)             
2025-12-04T10:34:11.3076267Z         y                        65535(0xffff)                      
2025-12-04T10:34:11.3076394Z         z                        65535(0xffff)                      
2025-12-04T10:34:11.3076538Z       FBarrier Max Size:       32                                 
2025-12-04T10:34:11.3076676Z     ISA 2                    
2025-12-04T10:34:11.3076816Z       Name:                    amdgcn-amd-amdhsa--gfx9-4-generic:sramecc+:xnack-
2025-12-04T10:34:11.3076991Z       Machine Models:          HSA_MACHINE_MODEL_LARGE            
2025-12-04T10:34:11.3077152Z       Profiles:                HSA_PROFILE_BASE                   
2025-12-04T10:34:11.3077312Z       Default Rounding Mode:   NEAR                               
2025-12-04T10:34:11.3077480Z       Default Rounding Mode:   NEAR                               
2025-12-04T10:34:11.3077633Z       Fast f16:                TRUE                               
2025-12-04T10:34:11.3077784Z       Workgroup Max Size:      1024(0x400)                        
2025-12-04T10:34:11.3077927Z       Workgroup Max Size per Dimension:
2025-12-04T10:34:11.3078053Z         x                        1024(0x400)                        
2025-12-04T10:34:11.3078181Z         y                        1024(0x400)                        
2025-12-04T10:34:11.3078346Z         z                        1024(0x400)                        
2025-12-04T10:34:11.3078486Z       Grid Max Size:           4294967295(0xffffffff)             
2025-12-04T10:34:11.3078623Z       Grid Max Size per Dimension:
2025-12-04T10:34:11.3078742Z         x                        2147483647(0x7fffffff)             
2025-12-04T10:34:11.3078870Z         y                        65535(0xffff)                      
2025-12-04T10:34:11.3079002Z         z                        65535(0xffff)                      
2025-12-04T10:34:11.3079152Z       FBarrier Max Size:       32                                 
2025-12-04T10:34:11.3079292Z *******                  
2025-12-04T10:34:11.3079400Z Agent 6                  
2025-12-04T10:34:11.3079505Z *******                  
2025-12-04T10:34:11.3079669Z   Name:                    gfx942                             
2025-12-04T10:34:11.3079820Z   Uuid:                    GPU-f24a9834b47f1628               
2025-12-04T10:34:11.3079980Z   Marketing Name:                                             
2025-12-04T10:34:11.3080139Z   Vendor Name:             AMD                                
2025-12-04T10:34:11.3080294Z   Feature:                 KERNEL_DISPATCH                    
2025-12-04T10:34:11.3080451Z   Profile:                 BASE_PROFILE                       
2025-12-04T10:34:11.3080618Z   Float Round Mode:        NEAR                               
2025-12-04T10:34:11.3080773Z   Max Queue Number:        128(0x80)                          
2025-12-04T10:34:11.3080927Z   Queue Min Size:          64(0x40)                           
2025-12-04T10:34:11.3081078Z   Queue Max Size:          131072(0x20000)                    
2025-12-04T10:34:11.3081228Z   Queue Type:              MULTI                              
2025-12-04T10:34:11.3081370Z   Node:                    5                                  
2025-12-04T10:34:11.3081513Z   Device Type:             GPU                                
2025-12-04T10:34:11.3081644Z   Cache Info:              
2025-12-04T10:34:11.3081764Z     L1:                      32(0x20) KB                        
2025-12-04T10:34:11.3081895Z     L2:                      4096(0x1000) KB                    
2025-12-04T10:34:11.3082026Z     L3:                      262144(0x40000) KB                 
2025-12-04T10:34:11.3082158Z   Chip ID:                 29861(0x74a5)                      
2025-12-04T10:34:11.3082305Z   ASIC Revision:           1(0x1)                             
2025-12-04T10:34:11.3082496Z   Cacheline Size:          128(0x80)                          
2025-12-04T10:34:11.3082651Z   Max Clock Freq. (MHz):   2100                               
2025-12-04T10:34:11.3082796Z   BDFID:                   5376                               
2025-12-04T10:34:11.3082944Z   Internal Node ID:        5                                  
2025-12-04T10:34:11.3083098Z   Compute Unit:            304                                
2025-12-04T10:34:11.3083248Z   SIMDs per CU:            4                                  
2025-12-04T10:34:11.3083399Z   Shader Engines:          32                                 
2025-12-04T10:34:11.3083557Z   Shader Arrs. per Eng.:   1                                  
2025-12-04T10:34:11.3083723Z   WatchPts on Addr. Ranges:4                                  
2025-12-04T10:34:11.3083887Z   Coherent Host Access:    FALSE                              
2025-12-04T10:34:11.3084025Z   Memory Properties:       
2025-12-04T10:34:11.3084146Z   Features:                KERNEL_DISPATCH 
2025-12-04T10:34:11.3084298Z   Fast F16 Operation:      TRUE                               
2025-12-04T10:34:11.3084458Z   Wavefront Size:          64(0x40)                           
2025-12-04T10:34:11.3084617Z   Workgroup Max Size:      1024(0x400)                        
2025-12-04T10:34:11.3084767Z   Workgroup Max Size per Dimension:
2025-12-04T10:34:11.3084894Z     x                        1024(0x400)                        
2025-12-04T10:34:11.3085027Z     y                        1024(0x400)                        
2025-12-04T10:34:11.3085194Z     z                        1024(0x400)                        
2025-12-04T10:34:11.3085338Z   Max Waves Per CU:        32(0x20)                           
2025-12-04T10:34:11.3085495Z   Max Work-item Per CU:    2048(0x800)                        
2025-12-04T10:34:11.3085653Z   Grid Max Size:           4294967295(0xffffffff)             
2025-12-04T10:34:11.3085794Z   Grid Max Size per Dimension:
2025-12-04T10:34:11.3085921Z     x                        2147483647(0x7fffffff)             
2025-12-04T10:34:11.3086055Z     y                        65535(0xffff)                      
2025-12-04T10:34:11.3086183Z     z                        65535(0xffff)                      
2025-12-04T10:34:11.3086340Z   Max fbarriers/Workgrp:   32                                 
2025-12-04T10:34:11.3086507Z   Packet Processor uCode:: 185                                
2025-12-04T10:34:11.3086667Z   SDMA engine uCode::      24                                 
2025-12-04T10:34:11.3086837Z   IOMMU Support::          None                               
2025-12-04T10:34:11.3086976Z   Pool Info:               
2025-12-04T10:34:11.3087085Z     Pool 1                   
2025-12-04T10:34:11.3087218Z       Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
2025-12-04T10:34:11.3087374Z       Size:                    268419072(0xfffc000) KB            
2025-12-04T10:34:11.3087529Z       Allocatable:             TRUE                               
2025-12-04T10:34:11.3087689Z       Alloc Granule:           4KB                                
2025-12-04T10:34:11.3087854Z       Alloc Recommended Granule:2048KB                             
2025-12-04T10:34:11.3088018Z       Alloc Alignment:         4KB                                
2025-12-04T10:34:11.3088180Z       Accessible by all:       FALSE                              
2025-12-04T10:34:11.3088316Z     Pool 2                   
2025-12-04T10:34:11.3088446Z       Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
2025-12-04T10:34:11.3088600Z       Size:                    268419072(0xfffc000) KB            
2025-12-04T10:34:11.3088752Z       Allocatable:             TRUE                               
2025-12-04T10:34:11.3088908Z       Alloc Granule:           4KB                                
2025-12-04T10:34:11.3089075Z       Alloc Recommended Granule:2048KB                             
2025-12-04T10:34:11.3089240Z       Alloc Alignment:         4KB                                
2025-12-04T10:34:11.3089434Z       Accessible by all:       FALSE                              
2025-12-04T10:34:11.3089617Z     Pool 3                   
2025-12-04T10:34:11.3089748Z       Segment:                 GLOBAL; FLAGS: FINE GRAINED        
2025-12-04T10:34:11.3089894Z       Size:                    268419072(0xfffc000) KB            
2025-12-04T10:34:11.3090043Z       Allocatable:             TRUE                               
2025-12-04T10:34:11.3090198Z       Alloc Granule:           4KB                                
2025-12-04T10:34:11.3090363Z       Alloc Recommended Granule:2048KB                             
2025-12-04T10:34:11.3090526Z       Alloc Alignment:         4KB                                
2025-12-04T10:34:11.3090684Z       Accessible by all:       FALSE                              
2025-12-04T10:34:11.3090819Z     Pool 4                   
2025-12-04T10:34:11.3090942Z       Segment:                 GROUP                              
2025-12-04T10:34:11.3091083Z       Size:                    64(0x40) KB                        
2025-12-04T10:34:11.3091235Z       Allocatable:             FALSE                              
2025-12-04T10:34:11.3091391Z       Alloc Granule:           0KB                                
2025-12-04T10:34:11.3091554Z       Alloc Recommended Granule:0KB                                
2025-12-04T10:34:11.3091717Z       Alloc Alignment:         0KB                                
2025-12-04T10:34:11.3091876Z       Accessible by all:       FALSE                              
2025-12-04T10:34:11.3092058Z   ISA Info:                
2025-12-04T10:34:11.3092166Z     ISA 1                    
2025-12-04T10:34:11.3092300Z       Name:                    amdgcn-amd-amdhsa--gfx942:sramecc+:xnack-
2025-12-04T10:34:11.3092469Z       Machine Models:          HSA_MACHINE_MODEL_LARGE            
2025-12-04T10:34:11.3092630Z       Profiles:                HSA_PROFILE_BASE                   
2025-12-04T10:34:11.3092798Z       Default Rounding Mode:   NEAR                               
2025-12-04T10:34:11.3092972Z       Default Rounding Mode:   NEAR                               
2025-12-04T10:34:11.3093128Z       Fast f16:                TRUE                               
2025-12-04T10:34:11.3093282Z       Workgroup Max Size:      1024(0x400)                        
2025-12-04T10:34:11.3093428Z       Workgroup Max Size per Dimension:
2025-12-04T10:34:11.3093557Z         x                        1024(0x400)                        
2025-12-04T10:34:11.3093693Z         y                        1024(0x400)                        
2025-12-04T10:34:11.3093829Z         z                        1024(0x400)                        
2025-12-04T10:34:11.3093971Z       Grid Max Size:           4294967295(0xffffffff)             
2025-12-04T10:34:11.3094108Z       Grid Max Size per Dimension:
2025-12-04T10:34:11.3094228Z         x                        2147483647(0x7fffffff)             
2025-12-04T10:34:11.3094357Z         y                        65535(0xffff)                      
2025-12-04T10:34:11.3094487Z         z                        65535(0xffff)                      
2025-12-04T10:34:11.3094631Z       FBarrier Max Size:       32                                 
2025-12-04T10:34:11.3094770Z     ISA 2                    
2025-12-04T10:34:11.3094909Z       Name:                    amdgcn-amd-amdhsa--gfx9-4-generic:sramecc+:xnack-
2025-12-04T10:34:11.3095089Z       Machine Models:          HSA_MACHINE_MODEL_LARGE            
2025-12-04T10:34:11.3095251Z       Profiles:                HSA_PROFILE_BASE                   
2025-12-04T10:34:11.3095418Z       Default Rounding Mode:   NEAR                               
2025-12-04T10:34:11.3095584Z       Default Rounding Mode:   NEAR                               
2025-12-04T10:34:11.3095738Z       Fast f16:                TRUE                               
2025-12-04T10:34:11.3095893Z       Workgroup Max Size:      1024(0x400)                        
2025-12-04T10:34:11.3096038Z       Workgroup Max Size per Dimension:
2025-12-04T10:34:11.3096206Z         x                        1024(0x400)                        
2025-12-04T10:34:11.3096337Z         y                        1024(0x400)                        
2025-12-04T10:34:11.3096466Z         z                        1024(0x400)                        
2025-12-04T10:34:11.3096609Z       Grid Max Size:           4294967295(0xffffffff)             
2025-12-04T10:34:11.3096751Z       Grid Max Size per Dimension:
2025-12-04T10:34:11.3096874Z         x                        2147483647(0x7fffffff)             
2025-12-04T10:34:11.3097006Z         y                        65535(0xffff)                      
2025-12-04T10:34:11.3097134Z         z                        65535(0xffff)                      
2025-12-04T10:34:11.3097279Z       FBarrier Max Size:       32                                 
2025-12-04T10:34:11.3097415Z *** Done ***             
2025-12-04T10:34:11.3097524Z + rocminfo
2025-12-04T10:34:11.3097620Z + grep -E 'Name:.*\sgfx|Marketing'
2025-12-04T10:34:11.3949624Z   Marketing Name:          AMD EPYC 9575F 64-Core Processor   
2025-12-04T10:34:11.3949828Z   Marketing Name:          AMD EPYC 9575F 64-Core Processor   
2025-12-04T10:34:11.3950008Z   Name:                    gfx942                             
2025-12-04T10:34:11.3950160Z   Marketing Name:                                             
2025-12-04T10:34:11.3950307Z   Name:                    gfx942                             
2025-12-04T10:34:11.3950451Z   Marketing Name:                                             
2025-12-04T10:34:11.3950596Z   Name:                    gfx942                             
2025-12-04T10:34:11.3950842Z   Marketing Name:                                             
2025-12-04T10:34:11.3950988Z   Name:                    gfx942                             
2025-12-04T10:34:11.3951133Z   Marketing Name:                                             
2025-12-04T10:34:11.4036433Z + MAYBE_ROCM=rocm/
2025-12-04T10:34:11.4036633Z + [[ linux-jammy-rocm-py3.10 == *xpu* ]]
2025-12-04T10:34:11.4036795Z + [[ linux-jammy-rocm-py3.10 != *-bazel-* ]]
2025-12-04T10:34:11.4036940Z + pip_install ninja==1.10.2
2025-12-04T10:34:11.4037095Z + pip_install_pkg='python3 -m pip install --progress-bar off'
2025-12-04T10:34:11.4037277Z + python3 -m pip install --progress-bar off ninja==1.10.2
2025-12-04T10:34:11.5962602Z Collecting ninja==1.10.2
2025-12-04T10:34:11.6221728Z   Downloading ninja-1.10.2-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl.metadata (5.0 kB)
2025-12-04T10:34:11.6311129Z Downloading ninja-1.10.2-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl (108 kB)
2025-12-04T10:34:11.7987479Z Installing collected packages: ninja
2025-12-04T10:34:11.7987915Z   Attempting uninstall: ninja
2025-12-04T10:34:11.7994457Z     Found existing installation: ninja 1.11.1.4
2025-12-04T10:34:11.8010674Z     Uninstalling ninja-1.11.1.4:
2025-12-04T10:34:11.8049310Z       Successfully uninstalled ninja-1.11.1.4
2025-12-04T10:34:11.8165221Z Successfully installed ninja-1.10.2
2025-12-04T10:34:11.8605629Z + export PATH=/var/lib/jenkins/.local/bin:/opt/cache/bin:/opt/rocm/llvm/bin:/opt/rocm/opencl/bin:/opt/rocm/hip/bin:/opt/rocm/hcc/bin:/opt/rocm/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
2025-12-04T10:34:11.8607316Z + PATH=/var/lib/jenkins/.local/bin:/opt/cache/bin:/opt/rocm/llvm/bin:/opt/rocm/opencl/bin:/opt/rocm/hip/bin:/opt/rocm/hcc/bin:/opt/rocm/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
2025-12-04T10:34:11.8608192Z + [[ linux-jammy-rocm-py3.10 == *aarch64* ]]
2025-12-04T10:34:11.8608485Z + [[ linux-jammy-rocm-py3.10 == *asan* ]]
2025-12-04T10:34:11.8608773Z + [[ linux-jammy-rocm-py3.10 == *-debug* ]]
2025-12-04T10:34:11.8609057Z + [[ linux-jammy-rocm-py3.10 != *-bazel-* ]]
2025-12-04T10:34:11.8609458Z + echo 'We are not in debug mode: linux-jammy-rocm-py3.10. Expect the assertion to pass'
2025-12-04T10:34:11.8610040Z We are not in debug mode: linux-jammy-rocm-py3.10. Expect the assertion to pass
2025-12-04T10:34:11.8612988Z + cd test
2025-12-04T10:34:11.8613328Z + python -c 'import torch; torch._C._crash_if_debug_asserts_fail(424242)'
2025-12-04T10:34:12.7334727Z + [[ distributed == \n\o\g\p\u\_\N\O\_\A\V\X\2 ]]
2025-12-04T10:34:12.7335048Z + [[ distributed == \n\o\g\p\u\_\A\V\X\5\1\2 ]]
2025-12-04T10:34:12.7335334Z + [[ distributed == \l\e\g\a\c\y\_\n\v\i\d\i\a\_\d\r\i\v\e\r ]]
2025-12-04T10:34:12.7340232Z + DYNAMO_BENCHMARK_FLAGS=()
2025-12-04T10:34:12.7340786Z + [[ distributed == *pr_time_benchmarks* ]]
2025-12-04T10:34:12.7341053Z + [[ distributed == *dynamo_eager* ]]
2025-12-04T10:34:12.7341301Z + [[ distributed == *aot_eager* ]]
2025-12-04T10:34:12.7341522Z + [[ distributed == *aot_inductor* ]]
2025-12-04T10:34:12.7341783Z + [[ distributed == *max_autotune_inductor* ]]
2025-12-04T10:34:12.7342016Z + [[ distributed == *inductor* ]]
2025-12-04T10:34:12.7342231Z + [[ distributed == *dynamic* ]]
2025-12-04T10:34:12.7342439Z + [[ distributed == *cpu* ]]
2025-12-04T10:34:12.7342641Z + [[ distributed == *xpu* ]]
2025-12-04T10:34:12.7342885Z + DYNAMO_BENCHMARK_FLAGS+=(--device cuda)
2025-12-04T10:34:12.7359283Z + [[ linux-jammy-rocm-py3.10 == *libtorch* ]]
2025-12-04T10:34:12.7359502Z + [[ linux-jammy-rocm-py3.10 == *-bazel-* ]]
2025-12-04T10:34:12.7366308Z + cd test
2025-12-04T10:34:12.7366510Z + python -c 'import torch; print(torch.__config__.show())'
2025-12-04T10:34:13.4577963Z PyTorch built with:
2025-12-04T10:34:13.4578141Z   - GCC 11.4
2025-12-04T10:34:13.4578249Z   - C++ Version: 201703
2025-12-04T10:34:13.4579006Z   - Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications
2025-12-04T10:34:13.4579281Z   - Intel(R) MKL-DNN v3.7.1 (Git Hash 8d263e693366ef8db40acc569cc7d8edf644556d)
2025-12-04T10:34:13.4579453Z   - OpenMP 201511 (a.k.a. OpenMP 4.5)
2025-12-04T10:34:13.4579731Z   - LAPACK is enabled (usually provided by MKL)
2025-12-04T10:34:13.4579866Z   - NNPACK is enabled
2025-12-04T10:34:13.4579982Z   - CPU capability usage: AVX512
2025-12-04T10:34:13.4580113Z   - HIP Runtime 7.1.25424
2025-12-04T10:34:13.4580222Z   - MIOpen 3.5.1
2025-12-04T10:34:13.4580317Z   - Magma 2.9.0
2025-12-04T10:34:13.4581944Z   - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, COMMIT_SHA=35b7a9a26c5923d98aebaa41a031dae21788a9ee, CXX_COMPILER=/opt/cache/bin/c++, CXX_FLAGS= -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_FBGEMM_GENAI -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -DC10_NODEPRECATED -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -faligned-new -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, TORCH_VERSION=2.10.0, USE_CUDA=OFF, USE_CUDNN=OFF, USE_CUSPARSELT=OFF, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=ON, USE_ROCM_KERNEL_ASSERT=OFF, USE_XCCL=OFF, USE_XPU=OFF, 
2025-12-04T10:34:13.4583603Z 
2025-12-04T10:34:13.6670003Z + cd test
2025-12-04T10:34:13.6670525Z + python -c 'import torch; print(torch.__config__.parallel_info())'
2025-12-04T10:34:14.3186471Z ATen/Parallel:
2025-12-04T10:34:14.3186944Z 	at::get_num_threads() : 128
2025-12-04T10:34:14.3187308Z 	at::get_num_interop_threads() : 128
2025-12-04T10:34:14.3187700Z OpenMP 201511 (a.k.a. OpenMP 4.5)
2025-12-04T10:34:14.3188029Z 	omp_get_max_threads() : 128
2025-12-04T10:34:14.3188636Z Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications
2025-12-04T10:34:14.3189239Z 	mkl_get_max_threads() : 128
2025-12-04T10:34:14.3189855Z Intel(R) MKL-DNN v3.7.1 (Git Hash 8d263e693366ef8db40acc569cc7d8edf644556d)
2025-12-04T10:34:14.3191030Z std::thread::hardware_concurrency() : 128
2025-12-04T10:34:14.3191371Z Environment variables:
2025-12-04T10:34:14.3191661Z 	OMP_NUM_THREADS : [not set]
2025-12-04T10:34:14.3191951Z 	MKL_NUM_THREADS : [not set]
2025-12-04T10:34:14.3192254Z ATen parallel backend: OpenMP
2025-12-04T10:34:14.3192457Z 
2025-12-04T10:34:14.5464366Z + [[ distributed == *numpy_2* ]]
2025-12-04T10:34:14.5464637Z + [[ linux-jammy-rocm-py3.10 == *aarch64* ]]
2025-12-04T10:34:14.5464847Z + [[ distributed == *backward* ]]
2025-12-04T10:34:14.5465085Z + [[ distributed == *libtorch_agnostic_targetting* ]]
2025-12-04T10:34:14.5465299Z + [[ distributed == *xla* ]]
2025-12-04T10:34:14.5465469Z + [[ distributed == *vllm* ]]
2025-12-04T10:34:14.5465633Z + [[ distributed == *executorch* ]]
2025-12-04T10:34:14.5465818Z + [[ distributed == \j\i\t\_\l\e\g\a\c\y ]]
2025-12-04T10:34:14.5466013Z + [[ distributed == \q\u\a\n\t\i\z\a\t\i\o\n ]]
2025-12-04T10:34:14.5466210Z + [[ linux-jammy-rocm-py3.10 == *libtorch* ]]
2025-12-04T10:34:14.5466412Z + [[ distributed == distributed ]]
2025-12-04T10:34:14.5466583Z + test_distributed
2025-12-04T10:34:14.5466752Z + echo 'Testing distributed python tests'
2025-12-04T10:34:14.5466943Z Testing distributed python tests
2025-12-04T10:34:14.5467185Z + python test/run_test.py --distributed-tests --shard 2 3 --verbose
2025-12-04T10:34:16.2378123Z Excluding distributed/rpc/test_faulty_agent on ROCm
2025-12-04T10:34:16.2378683Z Excluding distributed/rpc/test_tensorpipe_agent on ROCm
2025-12-04T10:34:16.2379788Z Excluding distributed/rpc/test_share_memory on ROCm
2025-12-04T10:34:16.2380273Z Excluding distributed/rpc/cuda/test_tensorpipe_agent on ROCm
2025-12-04T10:34:17.2573698Z Downloading https://ossci-metrics.s3.amazonaws.com/disabled-tests-condensed.json to /var/lib/jenkins/pytorch/test/.pytorch-disabled-tests.json
2025-12-04T10:34:17.6176325Z Ignoring disabled issues:  ['']
2025-12-04T10:34:17.6224093Z Found test times from artifacts
2025-12-04T10:34:17.6385735Z Found test times from artifacts
2025-12-04T10:34:17.6390576Z Running all tests
2025-12-04T10:34:17.6460001Z Running parallel tests on 1 processes
2025-12-04T10:34:17.6461770Z Name: tests to run (est. time: 161.22min)
2025-12-04T10:34:17.6462154Z   Serial tests (74):
2025-12-04T10:34:17.6462420Z     distributed/test_inductor_collectives 2/2
2025-12-04T10:34:17.6462743Z     distributed/_tools/test_fake_collectives 1/1
2025-12-04T10:34:17.6463056Z     distributed/test_control_collectives 1/1
2025-12-04T10:34:17.6463384Z     distributed/test_collective_utils 1/1
2025-12-04T10:34:17.6463692Z     distributed/test_c10d_object_collectives 1/1
2025-12-04T10:34:17.6463975Z     distributed/algorithms/test_join 1/1
2025-12-04T10:34:17.6464267Z     distributed/tensor/test_dtensor_compile 2/4
2025-12-04T10:34:17.6464594Z     distributed/pipelining/test_schedule_multiproc 1/1
2025-12-04T10:34:17.6464913Z     distributed/pipelining/test_pipe 1/1
2025-12-04T10:34:17.6465199Z     distributed/test_compute_comm_reordering 1/1
2025-12-04T10:34:17.6465495Z     distributed/tensor/test_dtensor 3/3
2025-12-04T10:34:17.6465799Z     distributed/test_aten_comm_compute_reordering 3/3
2025-12-04T10:34:17.6466111Z     distributed/tensor/test_redistribute 2/2
2025-12-04T10:34:17.6466395Z     distributed/tensor/test_tensor_ops 3/4
2025-12-04T10:34:17.6466667Z     distributed/test_device_mesh 1/2
2025-12-04T10:34:17.6466945Z     distributed/tensor/test_convolution_ops 1/1
2025-12-04T10:34:17.6467253Z     distributed/tensor/parallel/test_tp_style 1/1
2025-12-04T10:34:17.6467546Z     distributed/test_debug 1/1
2025-12-04T10:34:17.6467810Z     distributed/test_overlap_bucketing_unit 1/1
2025-12-04T10:34:17.6468173Z     distributed/checkpoint/_experimental/test_checkpoint_writer 1/1
2025-12-04T10:34:17.6468545Z     distributed/optim/test_named_optimizer 1/1
2025-12-04T10:34:17.6468888Z     distributed/checkpoint/_experimental/test_checkpointer 1/1
2025-12-04T10:34:17.6469225Z     distributed/tensor/test_api 1/1
2025-12-04T10:34:17.6469485Z     distributed/tensor/test_init 1/1
2025-12-04T10:34:17.6470418Z     distributed/checkpoint/e2e/test_fine_tuning 1/1
2025-12-04T10:34:17.6470724Z     distributed/tensor/test_matrix_ops 1/1
2025-12-04T10:34:17.6471003Z     distributed/pipelining/test_stage 1/1
2025-12-04T10:34:17.6471324Z     distributed/tensor/parallel/test_tp_random_state 1/1
2025-12-04T10:34:17.6471639Z     distributed/checkpoint/test_planner 1/1
2025-12-04T10:34:17.6471949Z     distributed/checkpoint/test_dtensor_checkpoint 1/1
2025-12-04T10:34:17.6472265Z     distributed/pipelining/test_schedule 1/1
2025-12-04T10:34:17.6472604Z     distributed/_composable/fsdp/test_fully_shard_overlap 1/1
2025-12-04T10:34:17.6472921Z     distributed/test_run 1/1
2025-12-04T10:34:17.6473167Z     distributed/tensor/test_math_ops 1/1
2025-12-04T10:34:17.6473449Z     distributed/tensor/test_pointwise_ops 1/1
2025-12-04T10:34:17.6473746Z     distributed/checkpoint/test_compatibility 1/1
2025-12-04T10:34:17.6473961Z     distributed/_tools/test_mem_tracker 1/1
2025-12-04T10:34:17.6474174Z     distributed/elastic/test_control_plane 1/1
2025-12-04T10:34:17.6474378Z     distributed/fsdp/test_fsdp_overlap 1/1
2025-12-04T10:34:17.6474576Z     distributed/test_functional_api 1/1
2025-12-04T10:34:17.6474840Z     distributed/_composable/test_composability/test_2d_composability 1/1
2025-12-04T10:34:17.6475109Z     distributed/fsdp/test_fsdp_optim_state 1/1
2025-12-04T10:34:17.6475310Z     distributed/tensor/test_view_ops 1/1
2025-12-04T10:34:17.6475514Z     distributed/fsdp/test_fsdp_state_dict 2/2
2025-12-04T10:34:17.6475863Z     distributed/fsdp/test_fsdp_exec_order 1/1
2025-12-04T10:34:17.6476067Z     distributed/test_distributed_spawn 2/7
2025-12-04T10:34:17.6476272Z     distributed/test_distributed_spawn 5/7
2025-12-04T10:34:17.6476471Z     distributed/fsdp/test_fsdp_input 1/1
2025-12-04T10:34:17.6476672Z     distributed/fsdp/test_fsdp_traversal 1/1
2025-12-04T10:34:17.6476890Z     distributed/fsdp/test_fsdp_ignored_modules 1/1
2025-12-04T10:34:17.6477107Z     distributed/fsdp/test_checkpoint_wrapper 1/1
2025-12-04T10:34:17.6477323Z     distributed/fsdp/test_fsdp_checkpoint 1/1
2025-12-04T10:34:17.6477528Z     distributed/fsdp/test_fsdp_fine_tune 1/1
2025-12-04T10:34:17.6477729Z     distributed/test_multi_threaded_pg 1/1
2025-12-04T10:34:17.6477974Z     distributed/_composable/fsdp/test_fully_shard_extensions 1/1
2025-12-04T10:34:17.6478255Z     distributed/checkpoint/test_file_system_checkpoint_cpu 1/1
2025-12-04T10:34:17.6478496Z     distributed/fsdp/test_wrap 1/1
2025-12-04T10:34:17.6478711Z     distributed/fsdp/test_hsdp_dtensor_state_dict 1/1
2025-12-04T10:34:17.6478944Z     distributed/fsdp/test_fsdp_hybrid_shard 1/1
2025-12-04T10:34:17.6479187Z     distributed/_composable/fsdp/test_fully_shard_training 1/1
2025-12-04T10:34:17.6479439Z     distributed/fsdp/test_fsdp_multiple_forward 1/1
2025-12-04T10:34:17.6479717Z     distributed/checkpoint/test_state_dict 1/1
2025-12-04T10:34:17.6479920Z     distributed/fsdp/test_fsdp_core 1/2
2025-12-04T10:34:17.6480108Z     distributed/test_c10d_spawn_ucc 1/1
2025-12-04T10:34:17.6480296Z     distributed/test_c10d_gloo 1/1
2025-12-04T10:34:17.6480487Z     distributed/test_c10d_ops_nccl 1/1
2025-12-04T10:34:17.6480686Z     distributed/elastic/events/lib_test 1/1
2025-12-04T10:34:17.6480889Z     distributed/elastic/metrics/api_test 1/1
2025-12-04T10:34:17.6481110Z     distributed/elastic/multiprocessing/api_test 1/1
2025-12-04T10:34:17.6481355Z     distributed/elastic/timer/local_timer_example 1/1
2025-12-04T10:34:17.6481584Z     distributed/elastic/timer/local_timer_test 1/1
2025-12-04T10:34:17.6481815Z     distributed/elastic/utils/distributed_test 1/1
2025-12-04T10:34:17.6482028Z     distributed/elastic/utils/logging_test 1/1
2025-12-04T10:34:17.6482233Z     distributed/elastic/utils/util_test 1/1
2025-12-04T10:34:17.6482430Z   Parallel tests (0):
2025-12-04T10:34:17.6482591Z Name: excluded (est. time: 0.0min)
2025-12-04T10:34:17.6482763Z   Serial tests (0):
2025-12-04T10:34:17.6482941Z   Parallel tests (0):
2025-12-04T10:34:17.6483299Z Running distributed/test_inductor_collectives 2/2 ... [2025-12-04 10:34:17.646370][5222498.625407181]
2025-12-04T10:34:17.6483623Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T10:34:17.6484131Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_inductor_collectives.py', '--shard-id=2', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:34:17.646615]
2025-12-04T10:35:47.7124069Z 
2025-12-04T10:35:47.7127743Z distributed/test_inductor_collectives 2/2 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_inductor_collectives_2.2_ff03bed5fd29f50f_.log
2025-12-04T10:35:47.7137589Z Running 28 items in this shard: test/distributed/test_inductor_collectives.py::TestCollectivesMultiProc::test_all_to_all_recompute_is_always_banned_override_with_ac_False, test/distributed/test_inductor_collectives.py::TestCollectivesMultiProc::test_all_to_all_single_inductor_split_sizes_none, test/distributed/test_inductor_collectives.py::TestCollectivesMultiProc::test_eager_allreduce_inductor_wait, test/distributed/test_inductor_collectives.py::TestCollectivesMultiProc::test_eager_async_allreduce_inductor_wait, test/distributed/test_inductor_collectives.py::TestCollectivesMultiProc::test_inductor_allreduce_eager_wait, test/distributed/test_inductor_collectives.py::TestCollectivesMultiProc::test_reduce_scatter_tensor_inductor, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_all_gather_bucket_bucket_mode_all, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_all_gather_bucket_bucket_mode_all_custom_ops, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_all_gather_bucket_multidtype_bucket_mode_all_custom_ops_multidtype, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_all_gather_bucket_path, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_all_reduce_bucket_bucket_mode_all, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_dynamo_get_world_group_source_GroupMember_WORLD, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_dynamo_pg_var, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_dynamo_rewrite_dist_all_gather_list, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_dynamo_rewrite_dist_allreduce_pg_mode_kwargs_none, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_dynamo_rewrite_dist_allreduce_pg_mode_positional_none, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_dynamo_rewrite_dist_allreduce_reduce_op_reduce_op0, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_dynamo_rewrite_dist_allreduce_reduce_op_reduce_op3, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_dynamo_support_collective_op_with_async_op_False, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_dynamo_trace_all_gather_tensor, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_inductor_all_gather_coalesced, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_inductor_doesnt_mutate_shared, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_inductor_doesnt_mutate_shared_graph_partition, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_inductor_single_op, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_inductor_steal_buffer, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_reduce_scatter_bucket_bucket_mode_all_custom_ops, test/distributed/test_inductor_collectives.py::TestCollectivesInductor::test_reorder_respects_wait_dep, test/distributed/test_inductor_collectives.py::TestSyncDecisionCrossRanks::test_all_reduce_comm_analysis
2025-12-04T10:35:47.7145298Z 
2025-12-04T10:35:47.7145650Z Finished distributed/test_inductor_collectives 2/2 ... [2025-12-04 10:35:47.712386][5222588.691423506], took 1.50min
2025-12-04T10:35:47.7146309Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T10:35:48.9545659Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T10:35:48.9546292Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading
2025-12-04T10:35:48.9546777Z Uploading artifacts took 0.00 seconds
2025-12-04T10:35:48.9547294Z Running distributed/_tools/test_fake_collectives 1/1 ... [2025-12-04 10:35:48.954392][5222589.933425658]
2025-12-04T10:35:48.9547812Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T10:35:48.9548866Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_tools/test_fake_collectives.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:35:48.954659]
2025-12-04T10:35:51.2729632Z 
2025-12-04T10:35:51.2730842Z distributed/_tools/test_fake_collectives 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._tools.test_fake_collectives_1.1_0fa9d9bee7702c92_.log
2025-12-04T10:35:51.2732154Z Running 1 items in this shard: test/distributed/_tools/test_fake_collectives.py::TestFakeCollectives::test_collectives
2025-12-04T10:35:51.2733198Z 
2025-12-04T10:35:51.2733582Z Finished distributed/_tools/test_fake_collectives 1/1 ... [2025-12-04 10:35:51.272630][5222592.251666068], took 0.04min
2025-12-04T10:35:51.2734953Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T10:35:51.2750016Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T10:35:51.2752947Z Running distributed/test_control_collectives 1/1 ... [2025-12-04 10:35:51.275144][5222592.254185345]
2025-12-04T10:35:51.2753362Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T10:35:51.2754941Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_control_collectives.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:35:51.275346]
2025-12-04T10:35:53.4930807Z 
2025-12-04T10:35:53.4931280Z distributed/test_control_collectives 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_control_collectives_1.1_39b5fa5d5139d686_.log
2025-12-04T10:35:53.4933763Z Running 13 items in this shard: test/distributed/test_control_collectives.py::TestCollectives::test_all_gather_timeout, test/distributed/test_control_collectives.py::TestCollectives::test_all_sum, test/distributed/test_control_collectives.py::TestCollectives::test_all_sum_timeout, test/distributed/test_control_collectives.py::TestCollectives::test_barrier, test/distributed/test_control_collectives.py::TestCollectives::test_barrier_timeout, test/distributed/test_control_collectives.py::TestCollectives::test_broadcast, test/distributed/test_control_collectives.py::TestCollectives::test_broadcast_timeout, test/distributed/test_control_collectives.py::TestCollectives::test_gather, test/distributed/test_control_collectives.py::TestCollectives::test_gather_timeout, test/distributed/test_control_collectives.py::TestCollectives::test_scatter, test/distributed/test_control_collectives.py::TestCollectives::test_scatter_timeout, test/distributed/test_control_collectives.py::TestCollectives::test_simple_user_func, test/distributed/test_control_collectives.py::TestCollectives::test_unique
2025-12-04T10:35:53.4935980Z 
2025-12-04T10:35:53.4936192Z Finished distributed/test_control_collectives 1/1 ... [2025-12-04 10:35:53.492735][5222594.471772094], took 0.04min
2025-12-04T10:35:53.4937030Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T10:35:53.4951173Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T10:35:53.4951961Z Running distributed/test_collective_utils 1/1 ... [2025-12-04 10:35:53.495075][5222594.474116213]
2025-12-04T10:35:53.4952202Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T10:35:53.4953952Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_collective_utils.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:35:53.495271]
2025-12-04T10:36:13.4893451Z 
2025-12-04T10:36:13.4894747Z distributed/test_collective_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_collective_utils_1.1_fa7a4a2b2eb0275c_.log
2025-12-04T10:36:13.4898242Z Running 9 items in this shard: test/distributed/test_collective_utils.py::TestCollectiveUtils::test_all_gather_result, test/distributed/test_collective_utils.py::TestCollectiveUtils::test_all_gather_result_no_pg, test/distributed/test_collective_utils.py::TestCollectiveUtils::test_all_gather_result_raises_exceptions_from_func, test/distributed/test_collective_utils.py::TestCollectiveUtils::test_broadcast_result, test/distributed/test_collective_utils.py::TestCollectiveUtils::test_broadcast_result_no_pg, test/distributed/test_collective_utils.py::TestCollectiveUtils::test_broadcast_result_raises_exceptions_from_func, test/distributed/test_collective_utils.py::TestCollectiveUtils::test_check_rng_sync_device_cpu, test/distributed/test_collective_utils.py::TestCollectiveUtils::test_check_rng_sync_device_cuda, test/distributed/test_collective_utils.py::TestUtils::test_summarize_ranks
2025-12-04T10:36:13.4902009Z 
2025-12-04T10:36:13.4902335Z Finished distributed/test_collective_utils 1/1 ... [2025-12-04 10:36:13.489094][5222614.468131509], took 0.33min
2025-12-04T10:36:13.4903405Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T10:36:13.4911673Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T10:36:13.4914019Z Running distributed/test_c10d_object_collectives 1/1 ... [2025-12-04 10:36:13.491314][5222614.47035563]
2025-12-04T10:36:13.4914394Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T10:36:13.4916437Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_c10d_object_collectives.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:36:13.491513]
2025-12-04T10:36:56.0218224Z 
2025-12-04T10:36:56.0219827Z distributed/test_c10d_object_collectives 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_c10d_object_collectives_1.1_81d285d60c1553a0_.log
2025-12-04T10:36:56.0224318Z Running 9 items in this shard: test/distributed/test_c10d_object_collectives.py::TestObjectCollectives::test_all_gather_object, test/distributed/test_c10d_object_collectives.py::TestObjectCollectives::test_broadcast_object_list, test/distributed/test_c10d_object_collectives.py::TestObjectCollectives::test_gather_object, test/distributed/test_c10d_object_collectives.py::TestObjectCollectives::test_scatter_object_list, test/distributed/test_c10d_object_collectives.py::TestObjectCollectives::test_send_recv_object_list, test/distributed/test_c10d_object_collectives.py::TestObjectCollectives::test_subpg_all_gather_object, test/distributed/test_c10d_object_collectives.py::TestObjectCollectives::test_subpg_broadcast_object, test/distributed/test_c10d_object_collectives.py::TestObjectCollectives::test_subpg_gather_object, test/distributed/test_c10d_object_collectives.py::TestObjectCollectives::test_subpg_scatter_object
2025-12-04T10:36:56.0226872Z 
2025-12-04T10:36:56.0227154Z Finished distributed/test_c10d_object_collectives 1/1 ... [2025-12-04 10:36:56.021469][5222657.000506365], took 0.71min
2025-12-04T10:36:56.0228041Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T10:36:56.0236532Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T10:36:56.0239078Z Running distributed/algorithms/test_join 1/1 ... [2025-12-04 10:36:56.023814][5222657.002854594]
2025-12-04T10:36:56.0239377Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T10:36:56.0241472Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/algorithms/test_join.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:36:56.024022]
2025-12-04T10:37:39.4578594Z 
2025-12-04T10:37:39.4582750Z distributed/algorithms/test_join 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.algorithms.test_join_1.1_964bfaa3d83ab92b_.log
2025-12-04T10:37:39.4586478Z Running 9 items in this shard: test/distributed/algorithms/test_join.py::TestJoin::test_join_kwargs, test/distributed/algorithms/test_join.py::TestJoin::test_multiple_joinable_disable, test/distributed/algorithms/test_join.py::TestJoin::test_multiple_joinables, test/distributed/algorithms/test_join.py::TestJoin::test_multiple_joinables_throw, test/distributed/algorithms/test_join.py::TestJoin::test_single_joinable, test/distributed/algorithms/test_join.py::TestJoin::test_single_joinable_disable, test/distributed/algorithms/test_join.py::TestJoin::test_single_joinable_main_hooks, test/distributed/algorithms/test_join.py::TestJoin::test_single_joinable_post_hooks, test/distributed/algorithms/test_join.py::TestJoin::test_single_joinable_throw
2025-12-04T10:37:39.4590027Z 
2025-12-04T10:37:39.4590406Z Finished distributed/algorithms/test_join 1/1 ... [2025-12-04 10:37:39.457415][5222700.436452288], took 0.72min
2025-12-04T10:37:39.4591385Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T10:37:39.4593921Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T10:37:39.4595795Z Running distributed/tensor/test_dtensor_compile 2/4 ... [2025-12-04 10:37:39.459466][5222700.438507841]
2025-12-04T10:37:39.4596194Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T10:37:39.4598290Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/tensor/test_dtensor_compile.py', '--shard-id=2', '--num-shards=4', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:37:39.459682]
2025-12-04T10:38:09.2253945Z 
2025-12-04T10:38:09.2255268Z distributed/tensor/test_dtensor_compile 2/4 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.test_dtensor_compile_2.4_d6f1cff278895a1e_.log
2025-12-04T10:38:09.2261283Z Running 12 items in this shard: test/distributed/tensor/test_dtensor_compile.py::TestDTensorCompile::test_dtensor_basic, test/distributed/tensor/test_dtensor_compile.py::TestDTensorCompile::test_dtensor_dynamic_cat, test/distributed/tensor/test_dtensor_compile.py::TestDTensorCompile::test_dtensor_dynamo_device_mesh_attrs, test/distributed/tensor/test_dtensor_compile.py::TestDTensorCompile::test_dtensor_partial_placement_graph_output, test/distributed/tensor/test_dtensor_compile.py::TestDTensorCompile::test_dynamo_dtensor, test/distributed/tensor/test_dtensor_compile.py::TestDTensorCompile::test_dynamo_dtensor_from_local_redistribute, test/distributed/tensor/test_dtensor_compile.py::TestDTensorCompile::test_dynamo_from_local_grad_placements_sequence_intermediate, test/distributed/tensor/test_dtensor_compile.py::TestDTensorCompile::test_dynamo_to_local_grad_placements_sequence_intermediate, test/distributed/tensor/test_dtensor_compile.py::TestDTensorCompile::test_fakify_dtensor, test/distributed/tensor/test_dtensor_compile.py::TestDTensorCompile::test_placement_compile, test/distributed/tensor/test_dtensor_compile.py::TestDTensorCompile::test_tp_compile_comm_reordering_graph_partition, test/distributed/tensor/test_dtensor_compile.py::TestDTensorCompileE2E::test_compile_dtensor_redistribute_backward_use_ca_True
2025-12-04T10:38:09.2265202Z 
2025-12-04T10:38:09.2265484Z Finished distributed/tensor/test_dtensor_compile 2/4 ... [2025-12-04 10:38:09.225070][5222730.204108576], took 0.50min
2025-12-04T10:38:09.2266400Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T10:38:09.2267649Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T10:38:09.2270680Z Running distributed/pipelining/test_schedule_multiproc 1/1 ... [2025-12-04 10:38:09.226923][5222730.205964011]
2025-12-04T10:38:09.2271033Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T10:38:09.2272456Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/pipelining/test_schedule_multiproc.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:38:09.227121]
2025-12-04T10:38:30.4284191Z 
2025-12-04T10:38:30.4285035Z distributed/pipelining/test_schedule_multiproc 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.pipelining.test_schedule_multiproc_1.1_f8b75c7df4461a65_.log
2025-12-04T10:38:30.4295172Z Running 34 items in this shard: test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_custom_function_callback, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_eval_inference_mode_ScheduleClass0, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_eval_inference_mode_ScheduleClass1, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_eval_inference_mode_ScheduleClass2, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_eval_inference_mode_ScheduleClass3, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_eval_inference_mode_ScheduleClass4, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_forward_only_ScheduleClass0, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_grad_with_manual_ScheduleClass0_shape_inference_False, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_grad_with_manual_ScheduleClass0_shape_inference_True, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_grad_with_manual_ScheduleClass1_shape_inference_False, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_grad_with_manual_ScheduleClass1_shape_inference_True, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_grad_with_manual_interleaved_ScheduleClass0, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_grad_with_manual_interleaved_ScheduleClass1, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_grad_with_manual_interleaved_ScheduleClass2, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_grad_with_tracer_ScheduleClass0, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_grad_with_tracer_ScheduleClass1, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_kwargs_with_tracer_ScheduleClass0, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_kwargs_with_tracer_ScheduleClass1, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_multi_iter_ScheduleClass0, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_multi_iter_ScheduleClass1, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_return_output_ScheduleClass0, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_return_output_ScheduleClass1, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_return_output_ScheduleClass2, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_return_output_ScheduleClass3, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_return_output_ScheduleClass4, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_schedule_with_weight_update_mlp_e2e_ScheduleClass0, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_v_shape_schedules_schedule_class0, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_v_shape_schedules_schedule_class1, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_zero_bubble_with_model_kwargs_ScheduleClass0, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_zero_bubble_with_model_kwargs_ScheduleClass1, test/distributed/pipelining/test_schedule_multiproc.py::CustomSchedulesTest::test_non_symmetric_stage_ids_schedule_class0, test/distributed/pipelining/test_schedule_multiproc.py::CustomSchedulesTest::test_non_symmetric_stage_ids_schedule_class1, test/distributed/pipelining/test_schedule_multiproc.py::CustomSchedulesTest::test_pipeline_schedule_runtime_custom_sched_ScheduleClass0, test/distributed/pipelining/test_schedule_multiproc.py::CustomSchedulesTest::test_schedule_with_native_zero_bubble_ScheduleClass0
2025-12-04T10:38:30.4302396Z 
2025-12-04T10:38:30.4302597Z Finished distributed/pipelining/test_schedule_multiproc 1/1 ... [2025-12-04 10:38:30.428260][5222751.407298343], took 0.35min
2025-12-04T10:38:30.4303190Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T10:38:30.4303691Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T10:38:30.4303995Z Running distributed/pipelining/test_pipe 1/1 ... [2025-12-04 10:38:30.430251][5222751.409292877]
2025-12-04T10:38:30.4304250Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T10:38:30.4305608Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/pipelining/test_pipe.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:38:30.430435]
2025-12-04T10:38:33.4571465Z 
2025-12-04T10:38:33.4572155Z distributed/pipelining/test_pipe 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.pipelining.test_pipe_1.1_2be582dd3db3f15a_.log
2025-12-04T10:38:33.4572899Z Running 3 items in this shard: test/distributed/pipelining/test_pipe.py::PipeTests::test_model_split_ModelClass0, test/distributed/pipelining/test_pipe.py::PipeTests::test_model_split_ModelClass1, test/distributed/pipelining/test_pipe.py::PipeTests::test_model_split_ModelClass2
2025-12-04T10:38:33.4573330Z 
2025-12-04T10:38:33.4573468Z Finished distributed/pipelining/test_pipe 1/1 ... [2025-12-04 10:38:33.456802][5222754.435839027], took 0.05min
2025-12-04T10:38:33.4575761Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T10:38:33.4587854Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T10:38:33.4588792Z Running distributed/test_compute_comm_reordering 1/1 ... [2025-12-04 10:38:33.458790][5222754.437831611]
2025-12-04T10:38:33.4589014Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T10:38:33.4592178Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_compute_comm_reordering.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:38:33.459010]
2025-12-04T10:40:14.3257846Z 
2025-12-04T10:40:14.3258462Z distributed/test_compute_comm_reordering 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_compute_comm_reordering_1.1_3dd7817ad3b3b53e_.log
2025-12-04T10:40:14.3260625Z Running 9 items in this shard: test/distributed/test_compute_comm_reordering.py::TestComputeCommReorderingMultiProc::test_grouped_scheduler_node_combo_kernels_False, test/distributed/test_compute_comm_reordering.py::TestComputeCommReorderingMultiProc::test_grouped_scheduler_node_combo_kernels_True, test/distributed/test_compute_comm_reordering.py::TestComputeCommReorderingMultiProc::test_inductor_default_comms_ordering, test/distributed/test_compute_comm_reordering.py::TestComputeCommReorderingMultiProc::test_nccl_heuristics, test/distributed/test_compute_comm_reordering.py::TestComputeCommReorderingMultiProc::test_raise_comms, test/distributed/test_compute_comm_reordering.py::TestComputeCommReorderingMultiProc::test_reorder_compute_for_overlap, test/distributed/test_compute_comm_reordering.py::TestComputeCommReorderingMultiProc::test_reorder_compute_for_overlap_custom_runtime_estimation, test/distributed/test_compute_comm_reordering.py::TestComputeCommReorderingMultiProc::test_sink_waits, test/distributed/test_compute_comm_reordering.py::TestComputeCommReorderingMultiProc::test_sink_waits_raise_comms
2025-12-04T10:40:14.3262897Z 
2025-12-04T10:40:14.3263045Z Finished distributed/test_compute_comm_reordering 1/1 ... [2025-12-04 10:40:14.325489][5222855.304526742], took 1.68min
2025-12-04T10:40:14.3263497Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T10:40:14.3275345Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T10:40:14.3277945Z Running distributed/tensor/test_dtensor 3/3 ... [2025-12-04 10:40:14.327714][5222855.306755563]
2025-12-04T10:40:14.3278148Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T10:40:14.3280336Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/tensor/test_dtensor.py', '--shard-id=3', '--num-shards=3', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:40:14.327913]
2025-12-04T10:41:11.4366145Z 
2025-12-04T10:41:11.4367230Z distributed/tensor/test_dtensor 3/3 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.test_dtensor_3.3_ee63d11d23a1f90e_.log
2025-12-04T10:41:11.4373237Z Running 25 items in this shard: test/distributed/tensor/test_dtensor.py::DTensorTest::test_dtensor_save_load_import, test/distributed/tensor/test_dtensor.py::DTensorTest::test_from_local, test/distributed/tensor/test_dtensor.py::DTensorTest::test_from_local_then_to_local, test/distributed/tensor/test_dtensor.py::DTensorTest::test_meta_dtensor, test/distributed/tensor/test_dtensor.py::DTensorTest::test_modules_w_meta_dtensor, test/distributed/tensor/test_dtensor.py::DTensorTest::test_shard_tensor, test/distributed/tensor/test_dtensor.py::DTensorTest::test_to_local, test/distributed/tensor/test_dtensor.py::DTensorTestWithLocalTensor::test_dtensor_constructor, test/distributed/tensor/test_dtensor.py::DTensorTestWithLocalTensor::test_dtensor_save_load, test/distributed/tensor/test_dtensor.py::DTensorTestWithLocalTensor::test_from_local_negative_dim, test/distributed/tensor/test_dtensor.py::DTensorTestWithLocalTensor::test_from_local_then_to_local, test/distributed/tensor/test_dtensor.py::DTensorTestWithLocalTensor::test_full_tensor_grad_hint, test/distributed/tensor/test_dtensor.py::DTensorTestWithLocalTensor::test_full_tensor_sync, test/distributed/tensor/test_dtensor.py::DTensorTestWithLocalTensor::test_shard_tensor_2d, test/distributed/tensor/test_dtensor.py::DTensorTestWithLocalTensor::test_to_local_grad_hint, test/distributed/tensor/test_dtensor.py::DTensorMeshTest::test_default_value_sub_mesh, test/distributed/tensor/test_dtensor.py::DTensorMeshTest::test_dtensor_device_mesh_device_conversion, test/distributed/tensor/test_dtensor.py::DTensorMeshTest::test_metadata_consistency_check, test/distributed/tensor/test_dtensor.py::DTensorMeshTestWithLocalTensor::test_dtensor_spec_local_shard_offset, test/distributed/tensor/test_dtensor.py::DTensorMeshTestWithLocalTensor::test_inplace_on_local_tensor_view, test/distributed/tensor/test_dtensor.py::TestDTensorPlacementTypesWithLocalTensor::test_split_tensor_1D, test/distributed/tensor/test_dtensor.py::TestDTensorSpec::test_default_shard_order, test/distributed/tensor/test_dtensor.py::TestDTensorSpec::test_dtensor_spec_print, test/distributed/tensor/test_dtensor.py::TestDTensorSpec::test_dtensor_spec_with_invalid_shard_order, test/distributed/tensor/test_dtensor.py::TestDTensorSpecWithLocalTensor::test_dtensor_spec_update
2025-12-04T10:41:11.4376540Z 
2025-12-04T10:41:11.4376672Z Finished distributed/tensor/test_dtensor 3/3 ... [2025-12-04 10:41:11.436374][5222912.415411396], took 0.95min
2025-12-04T10:41:11.4377109Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T10:41:11.4380911Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T10:41:11.4383166Z Running distributed/test_aten_comm_compute_reordering 3/3 ... [2025-12-04 10:41:11.438243][5222912.417284332]
2025-12-04T10:41:11.4383388Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T10:41:11.4385547Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_aten_comm_compute_reordering.py', '--shard-id=3', '--num-shards=3', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:41:11.438455]
2025-12-04T10:43:29.4169478Z 
2025-12-04T10:43:29.4170761Z distributed/test_aten_comm_compute_reordering 3/3 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_aten_comm_compute_reordering_3.3_488211ab712b5ae3_.log
2025-12-04T10:43:29.4177212Z Running 15 items in this shard: test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingMultiProc::test_grouped_scheduler_node, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingMultiProc::test_reorder_compute_for_overlap_mul, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingMultiProc::test_schedulable_wait, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingBucketing::test_basic_all_reduce_bucketing, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingBucketing::test_multidtype_bucketing, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingBucketing::test_raise_comms, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingBucketing::test_reorder_compute_for_overlap_mul, test/distributed/test_aten_comm_compute_reordering.py::TestManualOverlapBucketing::test_bucketing_reordering_pass_no_bucket, test/distributed/test_aten_comm_compute_reordering.py::TestManualOverlapBucketing::test_bucketing_reordering_pass_single_bucket, test/distributed/test_aten_comm_compute_reordering.py::TestManualOverlapBucketing::test_custom_estimator_for_non_compute_nodes, test/distributed/test_aten_comm_compute_reordering.py::TestManualOverlapBucketing::test_grouped_scheduler_node, test/distributed/test_aten_comm_compute_reordering.py::TestManualOverlapBucketing::test_make_graph_view_and_get_subgraph_by_path_custom_module_stack_fn, test/distributed/test_aten_comm_compute_reordering.py::TestManualOverlapBucketing::test_overlap_scheduling_via_config, test/distributed/test_aten_comm_compute_reordering.py::TestManualOverlapBucketing::test_schedulable_wait, test/distributed/test_aten_comm_compute_reordering.py::TestManualOverlapBucketing::test_sink_waits_raise_comms
2025-12-04T10:43:29.4182021Z 
2025-12-04T10:43:29.4182232Z Finished distributed/test_aten_comm_compute_reordering 3/3 ... [2025-12-04 10:43:29.416627][5223050.395664135], took 2.30min
2025-12-04T10:43:29.4182888Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T10:43:29.4185675Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T10:43:29.4186647Z Running distributed/tensor/test_redistribute 2/2 ... [2025-12-04 10:43:29.418527][5223050.397568321]
2025-12-04T10:43:29.4186964Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T10:43:29.4188916Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/tensor/test_redistribute.py', '--shard-id=2', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:43:29.418747]
2025-12-04T10:44:33.9913644Z 
2025-12-04T10:44:33.9914804Z distributed/tensor/test_redistribute 2/2 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.test_redistribute_2.2_6e81ee66e2d44373_.log
2025-12-04T10:44:33.9929320Z Running 33 items in this shard: test/distributed/tensor/test_redistribute.py::RedistributeTest::test_one_chunk_mesh, test/distributed/tensor/test_redistribute.py::RedistributeTest::test_partial_to_replicate_forward_backward_float32, test/distributed/tensor/test_redistribute.py::RedistributeTest::test_partial_to_shard_complex64, test/distributed/tensor/test_redistribute.py::RedistributeTest::test_replicate_to_local_partial_grad_complex64, test/distributed/tensor/test_redistribute.py::RedistributeTest::test_replicate_to_local_partial_grad_float32, test/distributed/tensor/test_redistribute.py::RedistributeTest::test_replicate_to_replicate_forward_backward_datatype_conversion, test/distributed/tensor/test_redistribute.py::RedistributeTest::test_replicate_to_shard_forward_backward, test/distributed/tensor/test_redistribute.py::RedistributeTest::test_shard_dim_alltoall_complex64, test/distributed/tensor/test_redistribute.py::RedistributeTest::test_shard_dim_alltoall_float32, test/distributed/tensor/test_redistribute.py::RedistributeTest::test_shard_to_replicate_forward_backward_complex64, test/distributed/tensor/test_redistribute.py::MultiDimRedistributeTest::test_redistribute_shard_dim_multi_dim_mesh, test/distributed/tensor/test_redistribute.py::DistributeWithDeviceOrderTest::test_generate_shard_orders, test/distributed/tensor/test_redistribute.py::DistributeWithDeviceOrderTest::test_ordered_distribute_all_combination, test/distributed/tensor/test_redistribute.py::DistributeWithDeviceOrderTest::test_ordered_redistribute_with_partial, test/distributed/tensor/test_redistribute.py::DistributeWithDeviceOrderTest::test_shard_order_same_data_as_strided_shard, test/distributed/tensor/test_redistribute.py::RedistributeTestWithLocalTensor::test_one_chunk_mesh, test/distributed/tensor/test_redistribute.py::RedistributeTestWithLocalTensor::test_partial_to_replicate_forward_backward_complex64, test/distributed/tensor/test_redistribute.py::RedistributeTestWithLocalTensor::test_partial_to_replicate_forward_backward_float32, test/distributed/tensor/test_redistribute.py::RedistributeTestWithLocalTensor::test_partial_to_shard_complex64, test/distributed/tensor/test_redistribute.py::RedistributeTestWithLocalTensor::test_redistribute_negative_shard_dim, test/distributed/tensor/test_redistribute.py::RedistributeTestWithLocalTensor::test_redistribute_to_partial, test/distributed/tensor/test_redistribute.py::RedistributeTestWithLocalTensor::test_redistribute_uneven_sharding, test/distributed/tensor/test_redistribute.py::RedistributeTestWithLocalTensor::test_replicate_to_partial, test/distributed/tensor/test_redistribute.py::RedistributeTestWithLocalTensor::test_replicate_to_replicate_forward_backward, test/distributed/tensor/test_redistribute.py::RedistributeTestWithLocalTensor::test_replicate_to_replicate_forward_backward_datatype_conversion, test/distributed/tensor/test_redistribute.py::RedistributeTestWithLocalTensor::test_shard_dim_alltoall_float32, test/distributed/tensor/test_redistribute.py::RedistributeTestWithLocalTensor::test_shard_to_replicate_forward_backward_datatype_conversion, test/distributed/tensor/test_redistribute.py::RedistributeTestWithLocalTensor::test_shard_to_replicate_forward_backward_float32, test/distributed/tensor/test_redistribute.py::MultiDimRedistributeTestWithLocalTensor::test_multi_dim_mesh, test/distributed/tensor/test_redistribute.py::DistributeWithDeviceOrderTestWithLocalTensor::test_generate_shard_orders, test/distributed/tensor/test_redistribute.py::DistributeWithDeviceOrderTestWithLocalTensor::test_ordered_redistribute, test/distributed/tensor/test_redistribute.py::DistributeWithDeviceOrderTestWithLocalTensor::test_ordered_redistribute_for_special_placement, test/distributed/tensor/test_redistribute.py::DistributeWithDeviceOrderTestWithLocalTensor::test_ordered_redistribute_with_partial
2025-12-04T10:44:33.9935024Z 
2025-12-04T10:44:33.9935166Z Finished distributed/tensor/test_redistribute 2/2 ... [2025-12-04 10:44:33.991153][5223114.970188501], took 1.08min
2025-12-04T10:44:33.9935667Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T10:44:33.9936069Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T10:44:33.9937990Z Running distributed/tensor/test_tensor_ops 3/4 ... [2025-12-04 10:44:33.993685][5223114.972726198]
2025-12-04T10:44:33.9938195Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T10:44:33.9940549Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/tensor/test_tensor_ops.py', '--shard-id=3', '--num-shards=4', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:44:33.993922]
2025-12-04T10:45:17.5162434Z 
2025-12-04T10:45:17.5163416Z distributed/tensor/test_tensor_ops 3/4 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.test_tensor_ops_3.4_b6a2a119f3629247_.log
2025-12-04T10:45:17.5170146Z Running 17 items in this shard: test/distributed/tensor/test_tensor_ops.py::DistTensorOpsTest::test_clone, test/distributed/tensor/test_tensor_ops.py::DistTensorOpsTest::test_copy_, test/distributed/tensor/test_tensor_ops.py::DistTensorOpsTest::test_dtensor_dtype_conversion, test/distributed/tensor/test_tensor_ops.py::DistTensorOpsTest::test_fill_inplace_partial_sum, test/distributed/tensor/test_tensor_ops.py::DistTensorOpsTest::test_full_like, test/distributed/tensor/test_tensor_ops.py::DistTensorOpsTest::test_inplace_op, test/distributed/tensor/test_tensor_ops.py::DistTensorOpsTest::test_new_empty_strided, test/distributed/tensor/test_tensor_ops.py::DistTensorOpsTest::test_where_type_promotion, test/distributed/tensor/test_tensor_ops.py::DistTensorOpsTestWithLocalTensor::test_aten_contiguous, test/distributed/tensor/test_tensor_ops.py::DistTensorOpsTestWithLocalTensor::test_equal, test/distributed/tensor/test_tensor_ops.py::DistTensorOpsTestWithLocalTensor::test_full_like, test/distributed/tensor/test_tensor_ops.py::DistTensorOpsTestWithLocalTensor::test_index, test/distributed/tensor/test_tensor_ops.py::DistTensorOpsTestWithLocalTensor::test_inplace_op, test/distributed/tensor/test_tensor_ops.py::DistTensorOpsTestWithLocalTensor::test_scatter, test/distributed/tensor/test_tensor_ops.py::DistTensorOpsTestWithLocalTensor::test_where_type_promotion, test/distributed/tensor/test_tensor_ops.py::DistTensorOpsTestWithLocalTensor::test_zero_inplace, test/distributed/tensor/test_tensor_ops.py::DistTensorOpsTestWithLocalTensor::test_zeros_like_partial_sum
2025-12-04T10:45:17.5175232Z 
2025-12-04T10:45:17.5175448Z Finished distributed/tensor/test_tensor_ops 3/4 ... [2025-12-04 10:45:17.515931][5223158.49496766], took 0.73min
2025-12-04T10:45:17.5176181Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T10:45:17.5179349Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T10:45:17.5179802Z Running distributed/test_device_mesh 1/2 ... [2025-12-04 10:45:17.517872][5223158.496913055]
2025-12-04T10:45:17.5180112Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T10:45:17.5182682Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_device_mesh.py', '--shard-id=1', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:45:17.518097]
2025-12-04T10:47:06.7533984Z 
2025-12-04T10:47:06.7535049Z distributed/test_device_mesh 1/2 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_device_mesh_1.2_c017d3d7efedc09c_.log
2025-12-04T10:47:06.7547353Z Running 34 items in this shard: test/distributed/test_device_mesh.py::DeviceMeshSetDeviceTest::test_auto_set_device_from_heuristic, test/distributed/test_device_mesh.py::DeviceMeshSetDeviceTest::test_manual_set_device, test/distributed/test_device_mesh.py::DeviceMeshTest::test_2d_mesh_eager_init_subgroup, test/distributed/test_device_mesh.py::DeviceMeshTest::test_2d_mesh_non_eager_init_subgroup, test/distributed/test_device_mesh.py::DeviceMeshTest::test_assert_invalid_mesh_tensor, test/distributed/test_device_mesh.py::DeviceMeshTest::test_device_mesh_2d, test/distributed/test_device_mesh.py::DeviceMeshTest::test_device_mesh_init_backend, test/distributed/test_device_mesh.py::DeviceMeshTest::test_fake_pg_device_mesh, test/distributed/test_device_mesh.py::DeviceMeshTest::test_from_group_with_invalid_mesh, test/distributed/test_device_mesh.py::DeviceMeshTest::test_get_group_and_get_all_groups, test/distributed/test_device_mesh.py::DeviceMeshTest::test_get_local_rank, test/distributed/test_device_mesh.py::DeviceMeshTest::test_get_local_rank_raises_exception, test/distributed/test_device_mesh.py::DeviceMeshTest::test_get_root_mesh_multiple_independent_meshes, test/distributed/test_device_mesh.py::DeviceMeshTest::test_init_process_group, test/distributed/test_device_mesh.py::InitDeviceMeshTest::test_backend_override_argument_dict_with_idx_and_backend_eager, test/distributed/test_device_mesh.py::TestDeviceMeshGetItem::test_concatenate_3d, test/distributed/test_device_mesh.py::TestDeviceMeshGetItem::test_flatten_mesh_1d, test/distributed/test_device_mesh.py::TestDeviceMeshGetItem::test_flatten_mesh_4d, test/distributed/test_device_mesh.py::TestDeviceMeshGetItem::test_get_item_1d, test/distributed/test_device_mesh.py::TestDeviceMeshGetItem::test_get_item_3d_noncontiguous_slicing, test/distributed/test_device_mesh.py::TestDeviceMeshGetItem::test_reconstruct_mesh_with_flatten_dim, test/distributed/test_device_mesh.py::TestDeviceMeshGetItem::test_unflatten_mesh_2d, test/distributed/test_device_mesh.py::TestDeviceMeshGetItem::test_unflatten_mesh_3d, test/distributed/test_device_mesh.py::TestMeshEnv::test_get_mesh_dim_by_name, test/distributed/test_device_mesh.py::TestMeshEnv::test_get_root_mesh, test/distributed/test_device_mesh.py::TestMeshEnv::test_mesh_slice_fake_tensor_mode, test/distributed/test_device_mesh.py::DeviceMeshCollectiveTest::test_all_gather_uneven, test/distributed/test_device_mesh.py::DeviceMeshCollectiveTest::test_broadcast_1d, test/distributed/test_device_mesh.py::DeviceMeshCollectiveTest::test_scatter_nd, test/distributed/test_device_mesh.py::DeviceMeshCollectiveTest::test_scatter_uneven, test/distributed/test_device_mesh.py::CuTeLayoutTest::test_coalesce, test/distributed/test_device_mesh.py::CuTeLayoutTest::test_coalesce_non_coalescible, test/distributed/test_device_mesh.py::CuTeLayoutTest::test_complement_n_group_layout, test/distributed/test_device_mesh.py::CuTeLayoutTest::test_remap_to_tensor
2025-12-04T10:47:06.7554132Z 
2025-12-04T10:47:06.7554308Z Finished distributed/test_device_mesh 1/2 ... [2025-12-04 10:47:06.753614][5223267.732650515], took 1.82min
2025-12-04T10:47:06.7554939Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T10:47:06.7557315Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T10:47:06.7560040Z Running distributed/tensor/test_convolution_ops 1/1 ... [2025-12-04 10:47:06.755888][5223267.734929115]
2025-12-04T10:47:06.7560342Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T10:47:06.7562440Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/tensor/test_convolution_ops.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:47:06.756125]
2025-12-04T10:48:37.0087935Z 
2025-12-04T10:48:37.0089116Z distributed/tensor/test_convolution_ops 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.test_convolution_ops_1.1_6498de81e25b02fc_.log
2025-12-04T10:48:37.0097624Z Running 16 items in this shard: test/distributed/tensor/test_convolution_ops.py::DistConvolutionOpsTest::test_conv1d, test/distributed/tensor/test_convolution_ops.py::DistConvolutionOpsTest::test_conv2d_module_no_bias, test/distributed/tensor/test_convolution_ops.py::DistConvolutionOpsTest::test_conv2d_no_bias_backward, test/distributed/tensor/test_convolution_ops.py::DistConvolutionOpsTest::test_conv2d_no_bias_compile, test/distributed/tensor/test_convolution_ops.py::DistConvolutionOpsTest::test_conv3d, test/distributed/tensor/test_convolution_ops.py::DistConvolutionOpsTest::test_conv_backward_none_grad_inp, test/distributed/tensor/test_convolution_ops.py::DistConvolutionOpsTest::test_depthwise_convolution, test/distributed/tensor/test_convolution_ops.py::DistConvolutionOpsTest::test_downsampling_convolution, test/distributed/tensor/test_convolution_ops.py::DistConvolutionOpsTestWithLocalTensor::test_conv1d, test/distributed/tensor/test_convolution_ops.py::DistConvolutionOpsTestWithLocalTensor::test_conv2d_module_no_bias, test/distributed/tensor/test_convolution_ops.py::DistConvolutionOpsTestWithLocalTensor::test_conv2d_no_bias_backward, test/distributed/tensor/test_convolution_ops.py::DistConvolutionOpsTestWithLocalTensor::test_conv2d_no_bias_compile, test/distributed/tensor/test_convolution_ops.py::DistConvolutionOpsTestWithLocalTensor::test_conv3d, test/distributed/tensor/test_convolution_ops.py::DistConvolutionOpsTestWithLocalTensor::test_conv_backward_none_grad_inp, test/distributed/tensor/test_convolution_ops.py::DistConvolutionOpsTestWithLocalTensor::test_depthwise_convolution, test/distributed/tensor/test_convolution_ops.py::DistConvolutionOpsTestWithLocalTensor::test_downsampling_convolution
2025-12-04T10:48:37.0101448Z 
2025-12-04T10:48:37.0101666Z Finished distributed/tensor/test_convolution_ops 1/1 ... [2025-12-04 10:48:37.008418][5223357.987457145], took 1.50min
2025-12-04T10:48:37.0102335Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T10:48:37.0102922Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T10:48:37.0105133Z Running distributed/tensor/parallel/test_tp_style 1/1 ... [2025-12-04 10:48:37.010348][5223357.98938905]
2025-12-04T10:48:37.0105451Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T10:48:37.0106760Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/tensor/parallel/test_tp_style.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:48:37.010537]
2025-12-04T10:49:34.5124015Z 
2025-12-04T10:49:34.5127201Z distributed/tensor/parallel/test_tp_style 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.parallel.test_tp_style_1.1_3604b571f850ed4b_.log
2025-12-04T10:49:34.5133851Z Running 18 items in this shard: test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTest::test_colwise_parallel_embedding, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTest::test_colwise_parallel_style, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTest::test_prepare_module_input, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTest::test_prepare_module_input_multiple_inputs, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTest::test_prepare_module_kwargs_input, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTest::test_prepare_module_output, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTest::test_rowwise_parallel_embedding, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTest::test_rowwise_parallel_style, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTest::test_sequence_parallel_style, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTestWithLocalTensor::test_colwise_parallel_embedding, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTestWithLocalTensor::test_colwise_parallel_style, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTestWithLocalTensor::test_prepare_module_input, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTestWithLocalTensor::test_prepare_module_input_multiple_inputs, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTestWithLocalTensor::test_prepare_module_kwargs_input, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTestWithLocalTensor::test_prepare_module_output, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTestWithLocalTensor::test_rowwise_parallel_embedding, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTestWithLocalTensor::test_rowwise_parallel_style, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTestWithLocalTensor::test_sequence_parallel_style
2025-12-04T10:49:34.5140170Z 
2025-12-04T10:49:34.5140452Z Finished distributed/tensor/parallel/test_tp_style 1/1 ... [2025-12-04 10:49:34.512160][5223415.491195863], took 0.96min
2025-12-04T10:49:34.5141136Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T10:49:34.5146503Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T10:49:34.5148922Z Running distributed/test_debug 1/1 ... [2025-12-04 10:49:34.514792][5223415.493833389]
2025-12-04T10:49:34.5149172Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T10:49:34.5151119Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_debug.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:49:34.514987]
2025-12-04T10:49:36.9331249Z 
2025-12-04T10:49:36.9332357Z distributed/test_debug 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_debug_1.1_ae75ebf1c3ebb08e_.log
2025-12-04T10:49:36.9333206Z Running 1 items in this shard: test/distributed/test_debug.py::TestDebug::test_all
2025-12-04T10:49:36.9333487Z 
2025-12-04T10:49:36.9333745Z Finished distributed/test_debug 1/1 ... [2025-12-04 10:49:36.932844][5223417.91188011], took 0.04min
2025-12-04T10:49:36.9343351Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T10:49:36.9354050Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T10:49:36.9356650Z Running distributed/test_overlap_bucketing_unit 1/1 ... [2025-12-04 10:49:36.935536][5223417.914576695]
2025-12-04T10:49:36.9357040Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T10:49:36.9358849Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_overlap_bucketing_unit.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:49:36.935747]
2025-12-04T10:49:42.9091604Z 
2025-12-04T10:49:42.9092523Z distributed/test_overlap_bucketing_unit 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_overlap_bucketing_unit_1.1_763225eee4d9b259_.log
2025-12-04T10:49:42.9094643Z Running 9 items in this shard: test/distributed/test_overlap_bucketing_unit.py::TestOverlapPreservingBucketing::test_can_bucket_all_reduce, test/distributed/test_overlap_bucketing_unit.py::TestOverlapPreservingBucketing::test_can_bucket_independent_collectives, test/distributed/test_overlap_bucketing_unit.py::TestOverlapPreservingBucketing::test_can_bucket_multidtype_collectives, test/distributed/test_overlap_bucketing_unit.py::TestOverlapPreservingBucketing::test_can_bucket_with_convert_dtype_as_hiding_nodes, test/distributed/test_overlap_bucketing_unit.py::TestOverlapPreservingBucketing::test_can_bucket_with_multiple_hiding_nodes, test/distributed/test_overlap_bucketing_unit.py::TestOverlapPreservingBucketing::test_cant_bucket_ag_with_rs_hiding_interval_between_final_mm_hidden_False, test/distributed/test_overlap_bucketing_unit.py::TestOverlapPreservingBucketing::test_cant_bucket_ag_with_rs_hiding_interval_between_final_mm_hidden_True, test/distributed/test_overlap_bucketing_unit.py::TestOverlapPreservingBucketing::test_cant_bucket_nested_hiding_intervals, test/distributed/test_overlap_bucketing_unit.py::TestCrossPGOverlap::test_cross_pg_prefetch_during_exposed_wait
2025-12-04T10:49:42.9096911Z 
2025-12-04T10:49:42.9097060Z Finished distributed/test_overlap_bucketing_unit 1/1 ... [2025-12-04 10:49:42.908927][5223423.887965904], took 0.10min
2025-12-04T10:49:42.9097558Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T10:49:42.9107514Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T10:49:42.9110178Z Running distributed/checkpoint/_experimental/test_checkpoint_writer 1/1 ... [2025-12-04 10:49:42.910929][5223423.889969468]
2025-12-04T10:49:42.9110435Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T10:49:42.9112208Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/checkpoint/_experimental/test_checkpoint_writer.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:49:42.911123]
2025-12-04T10:49:45.1792749Z 
2025-12-04T10:49:45.1794206Z distributed/checkpoint/_experimental/test_checkpoint_writer 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint._experimental.test_checkpoint_writer_1.1_94fc265d2f2ccc8a_.log
2025-12-04T10:49:45.1798996Z Running 8 items in this shard: test/distributed/checkpoint/_experimental/test_checkpoint_writer.py::TestCheckpointWriterConfig::test_custom_values, test/distributed/checkpoint/_experimental/test_checkpoint_writer.py::TestCheckpointWriterConfig::test_default_values, test/distributed/checkpoint/_experimental/test_checkpoint_writer.py::TestCheckpointWriter::test_close, test/distributed/checkpoint/_experimental/test_checkpoint_writer.py::TestCheckpointWriter::test_write_calls_barrier, test/distributed/checkpoint/_experimental/test_checkpoint_writer.py::TestCheckpointWriter::test_write_calls_commit_hooks, test/distributed/checkpoint/_experimental/test_checkpoint_writer.py::TestCheckpointWriter::test_write_creates_checkpoint_file, test/distributed/checkpoint/_experimental/test_checkpoint_writer.py::TestCheckpointWriter::test_write_without_barrier, test/distributed/checkpoint/_experimental/test_checkpoint_writer.py::TestCheckpointWriter::test_write_without_commit_hook
2025-12-04T10:49:45.1802093Z 
2025-12-04T10:49:45.1802456Z Finished distributed/checkpoint/_experimental/test_checkpoint_writer 1/1 ... [2025-12-04 10:49:45.178980][5223426.158017733], took 0.04min
2025-12-04T10:49:45.1803468Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T10:49:45.1807858Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T10:49:45.1810166Z Running distributed/optim/test_named_optimizer 1/1 ... [2025-12-04 10:49:45.180937][5223426.159978858]
2025-12-04T10:49:45.1810506Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T10:49:45.1812694Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/optim/test_named_optimizer.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:49:45.181156]
2025-12-04T10:49:46.3681954Z 
2025-12-04T10:49:46.3682849Z distributed/optim/test_named_optimizer 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.optim.test_named_optimizer_1.1_14d96f15037e7378_.log
2025-12-04T10:49:46.3683412Z 
2025-12-04T10:49:46.3683673Z Finished distributed/optim/test_named_optimizer 1/1 ... [2025-12-04 10:49:46.367922][5223427.346957067], took 0.02min
2025-12-04T10:49:46.3688861Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T10:49:46.3699879Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T10:49:46.3702392Z Running distributed/checkpoint/_experimental/test_checkpointer 1/1 ... [2025-12-04 10:49:46.370151][5223427.349192458]
2025-12-04T10:49:46.3702743Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T10:49:46.3704801Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/checkpoint/_experimental/test_checkpointer.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:49:46.370367]
2025-12-04T10:50:07.0660661Z 
2025-12-04T10:50:07.0661695Z distributed/checkpoint/_experimental/test_checkpointer 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint._experimental.test_checkpointer_1.1_4c484a42515acebe_.log
2025-12-04T10:50:07.0666777Z Running 11 items in this shard: test/distributed/checkpoint/_experimental/test_checkpointer.py::TestCheckpointer::test_load_strict_mode, test/distributed/checkpoint/_experimental/test_checkpointer.py::TestCheckpointer::test_load_with_map_location, test/distributed/checkpoint/_experimental/test_checkpointer.py::TestCheckpointer::test_nested_dict_partial_load, test/distributed/checkpoint/_experimental/test_checkpointer.py::TestCheckpointer::test_partial_load, test/distributed/checkpoint/_experimental/test_checkpointer.py::TestCheckpointer::test_save_and_load_basic, test/distributed/checkpoint/_experimental/test_checkpointer.py::TestCheckpointer::test_save_with_kwargs, test/distributed/checkpoint/_experimental/test_checkpointer.py::TestAsyncCheckpointerSpecific::test_async_error_handling, test/distributed/checkpoint/_experimental/test_checkpointer.py::TestAsyncCheckpointerSpecific::test_async_future_results, test/distributed/checkpoint/_experimental/test_checkpointer.py::TestAsyncCheckpointerSpecific::test_async_multiple_saves_ordering, test/distributed/checkpoint/_experimental/test_checkpointer.py::TestAsyncCheckpointerSpecific::test_async_returns_futures, test/distributed/checkpoint/_experimental/test_checkpointer.py::TestAsyncCheckpointerSpecific::test_async_sequential_saves_wait
2025-12-04T10:50:07.0670602Z 
2025-12-04T10:50:07.0670918Z Finished distributed/checkpoint/_experimental/test_checkpointer 1/1 ... [2025-12-04 10:50:07.065734][5223448.044768197], took 0.34min
2025-12-04T10:50:07.0671832Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T10:50:07.0681825Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T10:50:07.0684127Z Running distributed/tensor/test_api 1/1 ... [2025-12-04 10:50:07.068276][5223448.047316745]
2025-12-04T10:50:07.0684399Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T10:50:07.0686036Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/tensor/test_api.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:50:07.068497]
2025-12-04T10:51:01.8228151Z 
2025-12-04T10:51:01.8232018Z distributed/tensor/test_api 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.test_api_1.1_a0e6e6cdf7c9d61c_.log
2025-12-04T10:51:01.8238224Z Running 18 items in this shard: test/distributed/tensor/test_api.py::DTensorAPITest::test_checkpoint_apis_check_partial_placement, test/distributed/tensor/test_api.py::DTensorAPITest::test_distribute_module, test/distributed/tensor/test_api.py::DTensorAPITest::test_distribute_module_casting, test/distributed/tensor/test_api.py::DTensorAPITest::test_distribute_module_input_fn_output_fn, test/distributed/tensor/test_api.py::DTensorAPITest::test_distribute_module_input_fn_output_fn_warning, test/distributed/tensor/test_api.py::DTensorAPITest::test_distribute_module_meta, test/distributed/tensor/test_api.py::DTensorAPITest::test_distribute_tensor_errors, test/distributed/tensor/test_api.py::DTensorAPITest::test_distribute_tensor_rank, test/distributed/tensor/test_api.py::DTensorAPITest::test_distribute_tensor_uneven_sharding, test/distributed/tensor/test_api.py::DTensorAPITestWithLocalTensor::test_checkpoint_apis_check_partial_placement, test/distributed/tensor/test_api.py::DTensorAPITestWithLocalTensor::test_distribute_module, test/distributed/tensor/test_api.py::DTensorAPITestWithLocalTensor::test_distribute_module_casting, test/distributed/tensor/test_api.py::DTensorAPITestWithLocalTensor::test_distribute_module_input_fn_output_fn, test/distributed/tensor/test_api.py::DTensorAPITestWithLocalTensor::test_distribute_module_input_fn_output_fn_warning, test/distributed/tensor/test_api.py::DTensorAPITestWithLocalTensor::test_distribute_module_meta, test/distributed/tensor/test_api.py::DTensorAPITestWithLocalTensor::test_distribute_tensor_errors, test/distributed/tensor/test_api.py::DTensorAPITestWithLocalTensor::test_distribute_tensor_rank, test/distributed/tensor/test_api.py::DTensorAPITestWithLocalTensor::test_distribute_tensor_uneven_sharding
2025-12-04T10:51:01.8243293Z 
2025-12-04T10:51:01.8243540Z Finished distributed/tensor/test_api 1/1 ... [2025-12-04 10:51:01.822642][5223502.801679633], took 0.91min
2025-12-04T10:51:01.8244225Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T10:51:01.8247074Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T10:51:01.8249442Z Running distributed/tensor/test_init 1/1 ... [2025-12-04 10:51:01.824851][5223502.803893015]
2025-12-04T10:51:01.8249925Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T10:51:01.8251937Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/tensor/test_init.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:51:01.825068]
2025-12-04T10:51:35.8442139Z 
2025-12-04T10:51:35.8442804Z distributed/tensor/test_init 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.test_init_1.1_cead5736e32acdcb_.log
2025-12-04T10:51:35.8444985Z Running 13 items in this shard: test/distributed/tensor/test_init.py::DTensorInitOpsTest::test_init_ops, test/distributed/tensor/test_init.py::DTensorConstructorTest::test_empty, test/distributed/tensor/test_init.py::DTensorConstructorTest::test_full, test/distributed/tensor/test_init.py::DTensorConstructorTest::test_ones, test/distributed/tensor/test_init.py::DTensorConstructorTest::test_zeros, test/distributed/tensor/test_init.py::DTensorConstructorTest::test_zeros_full_mesh, test/distributed/tensor/test_init.py::DTensorConstructorTest::test_zeros_submesh, test/distributed/tensor/test_init.py::DTensorConstructorTestWithLocalTensor::test_empty, test/distributed/tensor/test_init.py::DTensorConstructorTestWithLocalTensor::test_full, test/distributed/tensor/test_init.py::DTensorConstructorTestWithLocalTensor::test_ones, test/distributed/tensor/test_init.py::DTensorConstructorTestWithLocalTensor::test_zeros, test/distributed/tensor/test_init.py::DTensorConstructorTestWithLocalTensor::test_zeros_full_mesh, test/distributed/tensor/test_init.py::DTensorConstructorTestWithLocalTensor::test_zeros_submesh
2025-12-04T10:51:35.8447267Z 
2025-12-04T10:51:35.8447395Z Finished distributed/tensor/test_init 1/1 ... [2025-12-04 10:51:35.843982][5223536.823018951], took 0.57min
2025-12-04T10:51:35.8447968Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T10:51:35.8458504Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T10:51:35.8461095Z Running distributed/checkpoint/e2e/test_fine_tuning 1/1 ... [2025-12-04 10:51:35.846007][5223536.825048315]
2025-12-04T10:51:35.8461320Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T10:51:35.8463267Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/checkpoint/e2e/test_fine_tuning.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:51:35.846223]
2025-12-04T10:51:46.3771657Z 
2025-12-04T10:51:46.3773151Z distributed/checkpoint/e2e/test_fine_tuning 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint.e2e.test_fine_tuning_1.1_f56a2870d773053c_.log
2025-12-04T10:51:46.3774293Z Running 1 items in this shard: test/distributed/checkpoint/e2e/test_fine_tuning.py::TestFineTuning::test_fine_tuning
2025-12-04T10:51:46.3774730Z 
2025-12-04T10:51:46.3775057Z Finished distributed/checkpoint/e2e/test_fine_tuning 1/1 ... [2025-12-04 10:51:46.377087][5223547.356123006], took 0.18min
2025-12-04T10:51:46.3782533Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T10:51:46.3794447Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T10:51:46.3796823Z Running distributed/tensor/test_matrix_ops 1/1 ... [2025-12-04 10:51:46.379600][5223547.358641694]
2025-12-04T10:51:46.3797171Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T10:51:46.3800077Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/tensor/test_matrix_ops.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:51:46.379812]
2025-12-04T10:53:28.4781104Z 
2025-12-04T10:53:28.4781990Z distributed/tensor/test_matrix_ops 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.test_matrix_ops_1.1_9b0fbc70f0d13ab5_.log
2025-12-04T10:53:28.4786291Z Running 30 items in this shard: test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTest::test_addmm, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTest::test_addmm_auto_redistribute, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTest::test_addmm_empty_operand, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTest::test_baddbmm, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTest::test_bmm, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTest::test_dtensor_mm, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTest::test_grouped_mm_kwargs0, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTest::test_grouped_mm_kwargs1, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTest::test_matmul, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTest::test_mm, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTest::test_scaled_dot_product_attention, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTest::test_scaled_mm, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTest::test_t, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTest::test_t_partial, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTest::test_tensordot_shampoo, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTestWithLocalTensor::test_addmm, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTestWithLocalTensor::test_addmm_auto_redistribute, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTestWithLocalTensor::test_addmm_empty_operand, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTestWithLocalTensor::test_baddbmm, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTestWithLocalTensor::test_bmm, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTestWithLocalTensor::test_dtensor_mm, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTestWithLocalTensor::test_grouped_mm_kwargs0, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTestWithLocalTensor::test_grouped_mm_kwargs1, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTestWithLocalTensor::test_matmul, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTestWithLocalTensor::test_mm, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTestWithLocalTensor::test_scaled_dot_product_attention, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTestWithLocalTensor::test_scaled_mm, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTestWithLocalTensor::test_t, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTestWithLocalTensor::test_t_partial, test/distributed/tensor/test_matrix_ops.py::DistMatrixOpsTestWithLocalTensor::test_tensordot_shampoo
2025-12-04T10:53:28.4791137Z 
2025-12-04T10:53:28.4791281Z Finished distributed/tensor/test_matrix_ops 1/1 ... [2025-12-04 10:53:28.477791][5223649.456830324], took 1.70min
2025-12-04T10:53:28.4791759Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T10:53:28.4796645Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T10:53:28.4799010Z Running distributed/pipelining/test_stage 1/1 ... [2025-12-04 10:53:28.479822][5223649.458863738]
2025-12-04T10:53:28.4799239Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T10:53:28.4801258Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/pipelining/test_stage.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:53:28.480013]
2025-12-04T10:53:55.5455883Z 
2025-12-04T10:53:55.5457472Z distributed/pipelining/test_stage 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.pipelining.test_stage_1.1_55b799f0e82c0299_.log
2025-12-04T10:53:55.5459437Z Running 8 items in this shard: test/distributed/pipelining/test_stage.py::StageTest::test_custom_dw_with_fb_schedule, test/distributed/pipelining/test_stage.py::StageTest::test_manual, test/distributed/pipelining/test_stage.py::StageTest::test_output_chunks_memory_usage, test/distributed/pipelining/test_stage.py::StageTest::test_tracer_ModelClass0, test/distributed/pipelining/test_stage.py::StageTest::test_tracer_ModelClass1, test/distributed/pipelining/test_stage.py::StageTest::test_tracer_kwargs_ModelClass0, test/distributed/pipelining/test_stage.py::StageNegativeTest::test_custom_dw_errors, test/distributed/pipelining/test_stage.py::StageNegativeTest::test_shape_prop_mismatch
2025-12-04T10:53:55.5460995Z 
2025-12-04T10:53:55.5461202Z Finished distributed/pipelining/test_stage 1/1 ... [2025-12-04 10:53:55.545358][5223676.524394337], took 0.45min
2025-12-04T10:53:55.5464930Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T10:53:55.5476033Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T10:53:55.5478059Z Running distributed/tensor/parallel/test_tp_random_state 1/1 ... [2025-12-04 10:53:55.547715][5223676.526756487]
2025-12-04T10:53:55.5478498Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T10:53:55.5480632Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/tensor/parallel/test_tp_random_state.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:53:55.547930]
2025-12-04T10:54:03.8750673Z 
2025-12-04T10:54:03.8751989Z distributed/tensor/parallel/test_tp_random_state 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.parallel.test_tp_random_state_1.1_3ff170dcbb1f0e74_.log
2025-12-04T10:54:03.8753529Z Running 1 items in this shard: test/distributed/tensor/parallel/test_tp_random_state.py::TensorParallelRandomStateTests::test_model_init
2025-12-04T10:54:03.8754132Z 
2025-12-04T10:54:03.8754579Z Finished distributed/tensor/parallel/test_tp_random_state 1/1 ... [2025-12-04 10:54:03.874707][5223684.853742402], took 0.14min
2025-12-04T10:54:03.8759754Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T10:54:03.8773191Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T10:54:03.8774063Z Running distributed/checkpoint/test_planner 1/1 ... [2025-12-04 10:54:03.877247][5223684.856288749]
2025-12-04T10:54:03.8774340Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T10:54:03.8776590Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/checkpoint/test_planner.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:54:03.877471]
2025-12-04T10:54:06.0954195Z 
2025-12-04T10:54:06.0955222Z distributed/checkpoint/test_planner 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint.test_planner_1.1_c0fc3fc5e7160f63_.log
2025-12-04T10:54:06.0961715Z Running 17 items in this shard: test/distributed/checkpoint/test_planner.py::TestSavePlan::test_dedup_plans, test/distributed/checkpoint/test_planner.py::TestSavePlan::test_finish_plan_with_caching, test/distributed/checkpoint/test_planner.py::TestSavePlan::test_global_plan, test/distributed/checkpoint/test_planner.py::TestSavePlan::test_global_plan_with_caching, test/distributed/checkpoint/test_planner.py::TestSavePlan::test_load_with_resharding, test/distributed/checkpoint/test_planner.py::TestSavePlan::test_load_with_world_size_diff_by_one, test/distributed/checkpoint/test_planner.py::TestSavePlan::test_local_load_plan, test/distributed/checkpoint/test_planner.py::TestSavePlan::test_local_plan, test/distributed/checkpoint/test_planner.py::TestSavePlan::test_local_plan_with_caching, test/distributed/checkpoint/test_planner.py::TestPlannerHelpers::test_compare_save_plans, test/distributed/checkpoint/test_planner.py::TestPlannerHelpers::test_create_read_item_from_chunks, test/distributed/checkpoint/test_planner.py::TestPlannerHelpers::test_merge_delta_local_plans, test/distributed/checkpoint/test_planner.py::TestValidateGlobalPlan::test_detect_overlapping_chunks, test/distributed/checkpoint/test_planner.py::TestValidateGlobalPlan::test_non_overlapping_chunks, test/distributed/checkpoint/test_planner.py::TestLoadPlanner::test_load_different_sizes_throws, test/distributed/checkpoint/test_planner.py::TestLoadPlanner::test_strict, test/distributed/checkpoint/test_planner.py::TestLoadPlanner::test_version_key_in_planner_data
2025-12-04T10:54:06.0966680Z 
2025-12-04T10:54:06.0966988Z Finished distributed/checkpoint/test_planner 1/1 ... [2025-12-04 10:54:06.095178][5223687.074212342], took 0.04min
2025-12-04T10:54:06.0967998Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T10:54:06.0976181Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T10:54:06.0978442Z Running distributed/checkpoint/test_dtensor_checkpoint 1/1 ... [2025-12-04 10:54:06.097755][5223687.076796169]
2025-12-04T10:54:06.0978751Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T10:54:06.0981276Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/checkpoint/test_dtensor_checkpoint.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:54:06.097983]
2025-12-04T10:54:13.2752499Z 
2025-12-04T10:54:13.2753687Z distributed/checkpoint/test_dtensor_checkpoint 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint.test_dtensor_checkpoint_1.1_889486be9b42fe95_.log
2025-12-04T10:54:13.2755124Z Running 1 items in this shard: test/distributed/checkpoint/test_dtensor_checkpoint.py::DTensorPlanner::test_distributed_tensor_planner
2025-12-04T10:54:13.2755739Z 
2025-12-04T10:54:13.2756175Z Finished distributed/checkpoint/test_dtensor_checkpoint 1/1 ... [2025-12-04 10:54:13.275007][5223694.254041466], took 0.12min
2025-12-04T10:54:13.2761906Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T10:54:13.2771789Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T10:54:13.2773883Z Running distributed/pipelining/test_schedule 1/1 ... [2025-12-04 10:54:13.277266][5223694.256307927]
2025-12-04T10:54:13.2774218Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T10:54:13.2776293Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/pipelining/test_schedule.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:54:13.277483]
2025-12-04T10:54:52.4951120Z 
2025-12-04T10:54:52.4951940Z distributed/pipelining/test_schedule 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.pipelining.test_schedule_1.1_5adcec24b68fb3cf_.log
2025-12-04T10:54:52.4962891Z Running 43 items in this shard: test/distributed/pipelining/test_schedule.py::ScheduleTest::test_get_schedule_class, test/distributed/pipelining/test_schedule.py::ScheduleTest::test_schedule_eval_then_train_ScheduleClass0, test/distributed/pipelining/test_schedule.py::ScheduleTest::test_schedule_eval_then_train_ScheduleClass1, test/distributed/pipelining/test_schedule.py::ScheduleTest::test_schedule_eval_then_train_ScheduleClass2, test/distributed/pipelining/test_schedule.py::ScheduleTest::test_schedule_eval_then_train_ScheduleClass3, test/distributed/pipelining/test_schedule.py::ScheduleTest::test_schedule_eval_then_train_ScheduleClass4, test/distributed/pipelining/test_schedule.py::ScheduleTest::test_schedule_with_single_stage_ScheduleClass0, test/distributed/pipelining/test_schedule.py::ScheduleTest::test_schedule_with_single_stage_ScheduleClass1, test/distributed/pipelining/test_schedule.py::ScheduleTest::test_schedule_with_single_stage_ScheduleClass2, test/distributed/pipelining/test_schedule.py::ScheduleTest::test_schedule_with_single_stage_ScheduleClass3, test/distributed/pipelining/test_schedule.py::ScheduleTest::test_schedule_with_single_stage_ScheduleClass4, test/distributed/pipelining/test_schedule.py::ScheduleTest::test_zero_bubble_schedule_errors_with_compile_ScheduleClass0, test/distributed/pipelining/test_schedule.py::ScheduleTest::test_zero_bubble_schedule_errors_with_compile_ScheduleClass1, test/distributed/pipelining/test_schedule.py::ScheduleTest::test_zero_bubble_schedule_errors_with_compile_ScheduleClass2, test/distributed/pipelining/test_schedule.py::TestSchedulePlan::test_pipeline_order_ScheduleClass0, test/distributed/pipelining/test_schedule.py::TestSchedulePlan::test_pipeline_order_ScheduleClass1, test/distributed/pipelining/test_schedule.py::TestSchedulePlan::test_pipeline_order_flex_and_zero_bubble_ScheduleClass0, test/distributed/pipelining/test_schedule.py::TestSchedulePlan::test_pipeline_order_flex_and_zero_bubble_ScheduleClass1, test/distributed/pipelining/test_schedule.py::TestSchedulePlan::test_pipeline_order_for_v_schedules_ScheduleClass0, test/distributed/pipelining/test_schedule.py::TestSchedulePlan::test_pipeline_order_for_v_schedules_ScheduleClass1, test/distributed/pipelining/test_schedule.py::TestScheduleCsv::test_csv_compare_ScheduleClass0_csv_name_dualpipev_4rank_10mb, test/distributed/pipelining/test_schedule.py::TestScheduleLowering::test_action_parse_action_str_and_ref0, test/distributed/pipelining/test_schedule.py::TestScheduleLowering::test_action_parse_action_str_and_ref1, test/distributed/pipelining/test_schedule.py::TestScheduleLowering::test_action_parse_action_str_and_ref2, test/distributed/pipelining/test_schedule.py::TestScheduleLowering::test_action_parse_action_str_and_ref3, test/distributed/pipelining/test_schedule.py::TestScheduleLowering::test_action_parse_action_str_and_ref4, test/distributed/pipelining/test_schedule.py::TestScheduleLowering::test_action_parse_action_str_and_ref5, test/distributed/pipelining/test_schedule.py::TestScheduleLowering::test_action_parse_action_str_and_ref6, test/distributed/pipelining/test_schedule.py::TestScheduleLowering::test_action_parse_action_str_and_ref7, test/distributed/pipelining/test_schedule.py::TestScheduleLowering::test_csv_csv_name_zb1p_2rank_2stagep, test/distributed/pipelining/test_schedule.py::TestScheduleLowering::test_grad_with_split_b_w, test/distributed/pipelining/test_schedule.py::TestScheduleLowering::test_grad_with_v_schedule, test/distributed/pipelining/test_schedule.py::TestScheduleLowering::test_merge_bw_test_info0, test/distributed/pipelining/test_schedule.py::TestScheduleLowering::test_reduce_grad_test_info0, test/distributed/pipelining/test_schedule.py::TestScheduleLowering::test_reduce_grad_test_info1, test/distributed/pipelining/test_schedule.py::TestScheduleLowering::test_send_recv_test_info0, test/distributed/pipelining/test_schedule.py::TestScheduleLowering::test_send_recv_test_info1, test/distributed/pipelining/test_schedule.py::TestScheduleLowering::test_unshard_reshard_test_info0, test/distributed/pipelining/test_schedule.py::TestScheduleLowering::test_unshard_reshard_test_info1, test/distributed/pipelining/test_schedule.py::TestValidateSchedule::test_invalid_schedule_missing_action, test/distributed/pipelining/test_schedule.py::TestValidateSchedule::test_invalid_schedule_missing_rank, test/distributed/pipelining/test_schedule.py::TestValidateSchedule::test_valid_schedule, test/distributed/pipelining/test_schedule.py::ScheduleUtilTests::test_generate_stage_to_rank_mapping
2025-12-04T10:54:52.4971295Z 
2025-12-04T10:54:52.4971512Z Finished distributed/pipelining/test_schedule 1/1 ... [2025-12-04 10:54:52.494740][5223733.473778975], took 0.65min
2025-12-04T10:54:52.4972021Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T10:54:52.4972469Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T10:54:52.4972767Z Running distributed/_composable/fsdp/test_fully_shard_overlap 1/1 ... [2025-12-04 10:54:52.496889][5223733.475929578]
2025-12-04T10:54:52.4973022Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T10:54:52.4973519Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_composable/fsdp/test_fully_shard_overlap.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:54:52.497085]
2025-12-04T10:55:02.7280847Z 
2025-12-04T10:55:02.7281791Z distributed/_composable/fsdp/test_fully_shard_overlap 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._composable.fsdp.test_fully_shard_overlap_1.1_01893b58c1154003_.log
2025-12-04T10:55:02.7283080Z Running 2 items in this shard: test/distributed/_composable/fsdp/test_fully_shard_overlap.py::TestFullyShardOverlap::test_fully_shard_post_optim_event_overlap, test/distributed/_composable/fsdp/test_fully_shard_overlap.py::TestFullyShardOverlap::test_fully_shard_training_overlap
2025-12-04T10:55:02.7283543Z 
2025-12-04T10:55:02.7283717Z Finished distributed/_composable/fsdp/test_fully_shard_overlap 1/1 ... [2025-12-04 10:55:02.727702][5223743.706737574], took 0.17min
2025-12-04T10:55:02.7290514Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T10:55:02.7304843Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T10:55:02.7306806Z Running distributed/test_run 1/1 ... [2025-12-04 10:55:02.730532][5223743.709573408]
2025-12-04T10:55:02.7307269Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T10:55:02.7308854Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_run.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:55:02.730740]
2025-12-04T10:55:04.9488048Z 
2025-12-04T10:55:04.9489047Z distributed/test_run 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_run_1.1_ae77be8219a4a84d_.log
2025-12-04T10:55:04.9491124Z Running 4 items in this shard: test/distributed/test_run.py::RunTest::test_config_from_args_signals_to_handle, test/distributed/test_run.py::RunTest::test_launch_agent_sets_environment_variable, test/distributed/test_run.py::RunTest::test_signals_to_handle_custom, test/distributed/test_run.py::RunTest::test_signals_to_handle_default
2025-12-04T10:55:04.9492425Z 
2025-12-04T10:55:04.9492740Z Finished distributed/test_run 1/1 ... [2025-12-04 10:55:04.948422][5223745.927458034], took 0.04min
2025-12-04T10:55:04.9499065Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T10:55:04.9513489Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T10:55:04.9514750Z Running distributed/tensor/test_math_ops 1/1 ... [2025-12-04 10:55:04.951334][5223745.930374486]
2025-12-04T10:55:04.9515463Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T10:55:04.9517847Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/tensor/test_math_ops.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:55:04.951570]
2025-12-04T10:57:40.1900990Z 
2025-12-04T10:57:40.1901879Z distributed/tensor/test_math_ops 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.test_math_ops_1.1_4b2d9d7d9577b7a3_.log
2025-12-04T10:57:40.1913777Z Running 54 items in this shard: test/distributed/tensor/test_math_ops.py::DistMathOpsTest::test_conj_complex_dtensor, test/distributed/tensor/test_math_ops.py::DistMathOpsTest::test_cumsum, test/distributed/tensor/test_math_ops.py::DistMathOpsTest::test_foreach_add_different_mesh, test/distributed/tensor/test_math_ops.py::DistMathOpsTest::test_foreach_norm, test/distributed/tensor/test_math_ops.py::DistMathOpsTest::test_foreach_norm_different_mesh, test/distributed/tensor/test_math_ops.py::DistMathOpsTest::test_foreach_norm_partial, test/distributed/tensor/test_math_ops.py::DistMathOpsTest::test_histc, test/distributed/tensor/test_math_ops.py::DistMathOpsTest::test_layer_norm_bwd, test/distributed/tensor/test_math_ops.py::DistMathOpsTest::test_layer_norm_bwd_req_grad, test/distributed/tensor/test_math_ops.py::DistMathOpsTest::test_layer_norm_fwd, test/distributed/tensor/test_math_ops.py::DistMathOpsTest::test_linalg_eigh, test/distributed/tensor/test_math_ops.py::DistMathOpsTest::test_linear_op_reductions, test/distributed/tensor/test_math_ops.py::DistMathOpsTest::test_logsumexp, test/distributed/tensor/test_math_ops.py::DistMathOpsTest::test_matching_partial_reduction_ops, test/distributed/tensor/test_math_ops.py::DistMathOpsTest::test_mean, test/distributed/tensor/test_math_ops.py::DistMathOpsTest::test_nll_loss_and_cross_entropy, test/distributed/tensor/test_math_ops.py::DistMathOpsTest::test_partial_reduction_ops, test/distributed/tensor/test_math_ops.py::DistMathOpsTest::test_rotary_embedding_complex_ops, test/distributed/tensor/test_math_ops.py::DistMathOpsTest::test_shard0_svd, test/distributed/tensor/test_math_ops.py::DistMathOpsTest::test_shard_math_ops, test/distributed/tensor/test_math_ops.py::DistMathOpsTest::test_softmax_fwd, test/distributed/tensor/test_math_ops.py::DistMathOpsTest::test_softmax_with_bwd, test/distributed/tensor/test_math_ops.py::DistMathOpsTest::test_std, test/distributed/tensor/test_math_ops.py::DistMathOpsTest::test_topk, test/distributed/tensor/test_math_ops.py::DistMathOpsTest::test_upsampling, test/distributed/tensor/test_math_ops.py::DistMathOpsTest::test_vector_norm, test/distributed/tensor/test_math_ops.py::DistMathOpsTest::test_vector_norm_partial, test/distributed/tensor/test_math_ops.py::DistMathOpsTestWithLocalTensor::test_conj_complex_dtensor, test/distributed/tensor/test_math_ops.py::DistMathOpsTestWithLocalTensor::test_cumsum, test/distributed/tensor/test_math_ops.py::DistMathOpsTestWithLocalTensor::test_foreach_add_different_mesh, test/distributed/tensor/test_math_ops.py::DistMathOpsTestWithLocalTensor::test_foreach_norm, test/distributed/tensor/test_math_ops.py::DistMathOpsTestWithLocalTensor::test_foreach_norm_different_mesh, test/distributed/tensor/test_math_ops.py::DistMathOpsTestWithLocalTensor::test_foreach_norm_partial, test/distributed/tensor/test_math_ops.py::DistMathOpsTestWithLocalTensor::test_histc, test/distributed/tensor/test_math_ops.py::DistMathOpsTestWithLocalTensor::test_layer_norm_bwd, test/distributed/tensor/test_math_ops.py::DistMathOpsTestWithLocalTensor::test_layer_norm_bwd_req_grad, test/distributed/tensor/test_math_ops.py::DistMathOpsTestWithLocalTensor::test_layer_norm_fwd, test/distributed/tensor/test_math_ops.py::DistMathOpsTestWithLocalTensor::test_linalg_eigh, test/distributed/tensor/test_math_ops.py::DistMathOpsTestWithLocalTensor::test_linear_op_reductions, test/distributed/tensor/test_math_ops.py::DistMathOpsTestWithLocalTensor::test_logsumexp, test/distributed/tensor/test_math_ops.py::DistMathOpsTestWithLocalTensor::test_matching_partial_reduction_ops, test/distributed/tensor/test_math_ops.py::DistMathOpsTestWithLocalTensor::test_mean, test/distributed/tensor/test_math_ops.py::DistMathOpsTestWithLocalTensor::test_nll_loss_and_cross_entropy, test/distributed/tensor/test_math_ops.py::DistMathOpsTestWithLocalTensor::test_partial_reduction_ops, test/distributed/tensor/test_math_ops.py::DistMathOpsTestWithLocalTensor::test_rotary_embedding_complex_ops, test/distributed/tensor/test_math_ops.py::DistMathOpsTestWithLocalTensor::test_shard0_svd, test/distributed/tensor/test_math_ops.py::DistMathOpsTestWithLocalTensor::test_shard_math_ops, test/distributed/tensor/test_math_ops.py::DistMathOpsTestWithLocalTensor::test_softmax_fwd, test/distributed/tensor/test_math_ops.py::DistMathOpsTestWithLocalTensor::test_softmax_with_bwd, test/distributed/tensor/test_math_ops.py::DistMathOpsTestWithLocalTensor::test_std, test/distributed/tensor/test_math_ops.py::DistMathOpsTestWithLocalTensor::test_topk, test/distributed/tensor/test_math_ops.py::DistMathOpsTestWithLocalTensor::test_upsampling, test/distributed/tensor/test_math_ops.py::DistMathOpsTestWithLocalTensor::test_vector_norm, test/distributed/tensor/test_math_ops.py::DistMathOpsTestWithLocalTensor::test_vector_norm_partial
2025-12-04T10:57:40.1922529Z 
2025-12-04T10:57:40.1922674Z Finished distributed/tensor/test_math_ops 1/1 ... [2025-12-04 10:57:40.189780][5223901.168818333], took 2.59min
2025-12-04T10:57:40.1923180Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T10:57:40.1923584Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T10:57:40.1923811Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading
2025-12-04T10:57:40.1923999Z Uploading artifacts took 0.00 seconds
2025-12-04T10:57:40.1924508Z Running distributed/tensor/test_pointwise_ops 1/1 ... [2025-12-04 10:57:40.192336][5223901.171368421]
2025-12-04T10:57:40.1924726Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T10:57:40.1926564Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/tensor/test_pointwise_ops.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:57:40.192563]
2025-12-04T11:27:51.1753855Z 
2025-12-04T11:27:51.1754713Z PRINTING LOG FILE of distributed/tensor/test_pointwise_ops 1/1 (test/test-reports/distributed.tensor.test_pointwise_ops_1.1_b92502ef2d44a395_.log)
2025-12-04T11:27:51.1755685Z Test results will be stored in test-reports/python-pytest/distributed.tensor.test_pointwise_ops/distributed.tensor.test_pointwise_ops-7404e7f097e17487.xml
2025-12-04T11:27:51.1756346Z ============================= test session starts ==============================
2025-12-04T11:27:51.1756858Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:27:51.1757270Z cachedir: .pytest_cache
2025-12-04T11:27:51.1757799Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:27:51.1758325Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T11:27:51.1758564Z configfile: pytest.ini
2025-12-04T11:27:51.1759026Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:27:51.1759531Z collecting ... collected 18 items
2025-12-04T11:27:51.1759989Z stepcurrent: Cannot find last run test, not skipping
2025-12-04T11:27:51.1766370Z Running 18 items in this shard: test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTest::test_activations, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTest::test_dropout, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTest::test_dropout_backward, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTest::test_dropout_errors, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTest::test_inplace_op_partial_to_replicate, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTest::test_mul_out, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTest::test_mul_partial, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTest::test_partial_add, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTest::test_partial_replicate_add, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_activations, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_dropout, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_dropout_backward, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_dropout_errors, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_inplace_op_partial_to_replicate, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_mul_out, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_mul_partial, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_partial_add, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_partial_replicate_add
2025-12-04T11:27:51.1770887Z 
2025-12-04T11:27:51.1771103Z distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTest::test_activations PASSED [0.5603s] [  5%]
2025-12-04T11:27:51.1771717Z distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTest::test_dropout SKIPPED [0.0002s] (testing RNG based ops is broken: https://github.com/pytorch/PiPPy/issues/494) [ 11%]
2025-12-04T11:27:51.1772341Z distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTest::test_dropout_backward Command took >30min, returning 124
2025-12-04T11:27:51.1772670Z Got exit code 124
2025-12-04T11:27:51.1772813Z Retrying single test...
2025-12-04T11:27:51.1773216Z Test results will be stored in test-reports/python-pytest/distributed.tensor.test_pointwise_ops/distributed.tensor.test_pointwise_ops-26d6a19f76ee9373.xml
2025-12-04T11:27:51.1773652Z ============================= test session starts ==============================
2025-12-04T11:27:51.1773961Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:27:51.1774240Z cachedir: .pytest_cache
2025-12-04T11:27:51.1774556Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:27:51.1774902Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T11:27:51.1775065Z configfile: pytest.ini
2025-12-04T11:27:51.1775392Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:27:51.1775783Z collecting ... collected 18 items / 17 deselected / 1 selected
2025-12-04T11:27:51.1776224Z stepcurrent: skipping 2 already run items. Running only test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTest::test_dropout_backward
2025-12-04T11:27:51.1776608Z Running 1 items in this shard
2025-12-04T11:27:51.1776714Z 
2025-12-04T11:27:51.1776926Z distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTest::test_dropout_backward PASSED [0.4689s] [100%]
2025-12-04T11:27:51.1777191Z 
2025-12-04T11:27:51.1777567Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.tensor.test_pointwise_ops/distributed.tensor.test_pointwise_ops-26d6a19f76ee9373.xml -
2025-12-04T11:27:51.1778061Z ======================= 1 passed, 17 deselected in 0.48s =======================
2025-12-04T11:27:51.1778227Z Got exit code 0
2025-12-04T11:27:51.1778441Z Test succeeded in new process, continuing with the rest of the tests
2025-12-04T11:27:51.1778805Z Test results will be stored in test-reports/python-pytest/distributed.tensor.test_pointwise_ops/distributed.tensor.test_pointwise_ops-a434aa288c83b7b1.xml
2025-12-04T11:27:51.1779141Z ============================= test session starts ==============================
2025-12-04T11:27:51.1779373Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:27:51.1779623Z cachedir: .pytest_cache
2025-12-04T11:27:51.1779875Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:27:51.1780137Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T11:27:51.1780263Z configfile: pytest.ini
2025-12-04T11:27:51.1780516Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:27:51.1780822Z collecting ... collected 18 items / 3 deselected / 15 selected
2025-12-04T11:27:51.1781005Z stepcurrent: skipping 3 already run items.
2025-12-04T11:27:51.1781152Z Running 15 items in this shard
2025-12-04T11:27:51.1781230Z 
2025-12-04T11:27:51.1781394Z distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTest::test_dropout_errors PASSED [0.3786s] [  6%]
2025-12-04T11:27:51.1781782Z distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTest::test_inplace_op_partial_to_replicate PASSED [0.0539s] [ 13%]
2025-12-04T11:27:51.1782149Z distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTest::test_mul_out PASSED [0.0941s] [ 20%]
2025-12-04T11:27:51.1782535Z distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTest::test_mul_partial PASSED [0.0736s] [ 26%]
2025-12-04T11:27:51.1782874Z distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTest::test_partial_add PASSED [0.0110s] [ 33%]
2025-12-04T11:27:51.1786079Z distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTest::test_partial_replicate_add PASSED [0.0268s] [ 40%]
2025-12-04T11:27:51.1786481Z distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_activations PASSED [0.1851s] [ 46%]
2025-12-04T11:27:51.1787004Z distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_dropout SKIPPED [0.0002s] (testing RNG based ops is broken: https://github.com/pytorch/PiPPy/issues/494) [ 53%]
2025-12-04T11:27:51.1787534Z distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_dropout_backward PASSED [0.0201s] [ 60%]
2025-12-04T11:27:51.1787961Z distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_dropout_errors PASSED [0.0081s] [ 66%]
2025-12-04T11:27:51.1788403Z distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_inplace_op_partial_to_replicate PASSED [0.0323s] [ 73%]
2025-12-04T11:27:51.1788808Z distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_mul_out PASSED [0.0414s] [ 80%]
2025-12-04T11:27:51.1789164Z distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_mul_partial PASSED [0.2208s] [ 86%]
2025-12-04T11:27:51.1789527Z distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_partial_add PASSED [0.0288s] [ 93%]
2025-12-04T11:27:51.1789947Z distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_partial_replicate_add PASSED [0.1147s] [100%]
2025-12-04T11:27:51.1790159Z 
2025-12-04T11:27:51.1790415Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.tensor.test_pointwise_ops/distributed.tensor.test_pointwise_ops-a434aa288c83b7b1.xml -
2025-12-04T11:27:51.1790781Z ================= 14 passed, 1 skipped, 3 deselected in 1.31s ==================
2025-12-04T11:27:51.1791115Z [W1204 11:27:50.971711057 ProcessGroup.cpp:367] Warning: At the time of process termination, there are still 12 unwaited collective calls. Please review your program to ensure that:
2025-12-04T11:27:51.1791529Z 1. c10d_functional.wait_tensor() is invoked on all tensors returned from c10d_functional collective,
2025-12-04T11:27:51.1791899Z 2. c10d_functional.wait_tensor() is invoked on all output tensors of async_op=True torch.distributed collective called under `with allow_inflight_collective_as_graph_input_ctx():`,
2025-12-04T11:27:51.1792243Z before the output tensors of the collective are used. (function ~WorkRegistry)
2025-12-04T11:27:51.1792587Z The following tests failed and then succeeded when run in a new process['test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTest::test_dropout_backward']
2025-12-04T11:27:51.1792834Z 
2025-12-04T11:27:51.1793040Z FINISHED PRINTING LOG FILE of distributed/tensor/test_pointwise_ops 1/1 (test/test-reports/distributed.tensor.test_pointwise_ops_1.1_b92502ef2d44a395_.log)
2025-12-04T11:27:51.1793277Z 
2025-12-04T11:27:51.1793410Z Finished distributed/tensor/test_pointwise_ops 1/1 ... [2025-12-04 11:27:51.174672][5225712.153709353], took 30.18min
2025-12-04T11:27:51.1793846Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T11:27:51.1794241Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T11:27:51.1794455Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading
2025-12-04T11:27:51.1794634Z Uploading artifacts took 0.00 seconds
2025-12-04T11:27:51.1794875Z Running distributed/checkpoint/test_compatibility 1/1 ... [2025-12-04 11:27:51.176855][5225712.155896255]
2025-12-04T11:27:51.1795083Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T11:27:51.1795503Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/checkpoint/test_compatibility.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:27:51.177075]
2025-12-04T11:27:53.3449020Z 
2025-12-04T11:27:53.3449851Z distributed/checkpoint/test_compatibility 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint.test_compatibility_1.1_70f9582afd7d5a9a_.log
2025-12-04T11:27:53.3451880Z Running 4 items in this shard: test/distributed/checkpoint/test_compatibility.py::TestDCPCompatbility::test_metadata, test/distributed/checkpoint/test_compatibility.py::TestDCPCompatbility::test_sharded_tensor_dependency, test/distributed/checkpoint/test_compatibility.py::TestDCPCompatbility::test_storage_meta, test/distributed/checkpoint/test_compatibility.py::TestDCPCompatbility::test_with_v_2_3
2025-12-04T11:27:53.3453231Z 
2025-12-04T11:27:53.3453556Z Finished distributed/checkpoint/test_compatibility 1/1 ... [2025-12-04 11:27:53.344601][5225714.323637149], took 0.04min
2025-12-04T11:27:53.3459855Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T11:27:53.3470766Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T11:27:53.3473153Z Running distributed/_tools/test_mem_tracker 1/1 ... [2025-12-04 11:27:53.347231][5225714.326272766]
2025-12-04T11:27:53.3473473Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T11:27:53.3475615Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_tools/test_mem_tracker.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:27:53.347450]
2025-12-04T11:28:00.7289999Z 
2025-12-04T11:28:00.7291431Z distributed/_tools/test_mem_tracker 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._tools.test_mem_tracker_1.1_bc0610b4681d5408_.log
2025-12-04T11:28:00.7292801Z Running 3 items in this shard: test/distributed/_tools/test_mem_tracker.py::TestMemTracker::test_accelerator_tracker_equivalence, test/distributed/_tools/test_mem_tracker.py::TestMemTracker::test_tracker_attribution, test/distributed/_tools/test_mem_tracker.py::TestMemTracker::test_tracker_with_activation_checkpointing
2025-12-04T11:28:00.7293429Z 
2025-12-04T11:28:00.7293570Z Finished distributed/_tools/test_mem_tracker 1/1 ... [2025-12-04 11:28:00.728644][5225721.707680062], took 0.12min
2025-12-04T11:28:00.7299213Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T11:28:00.7310440Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T11:28:00.7312346Z Running distributed/elastic/test_control_plane 1/1 ... [2025-12-04 11:28:00.731159][5225721.71020011]
2025-12-04T11:28:00.7312573Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T11:28:00.7314474Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/elastic/test_control_plane.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:28:00.731349]
2025-12-04T11:28:03.2997214Z 
2025-12-04T11:28:03.2998150Z distributed/elastic/test_control_plane 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.elastic.test_control_plane_1.1_f4cb43a87c9834ba_.log
2025-12-04T11:28:03.3002405Z Running 10 items in this shard: test/distributed/elastic/test_control_plane.py::WorkerServerTest::test_dump_nccl_trace_pickle, test/distributed/elastic/test_control_plane.py::WorkerServerTest::test_dump_nccl_trace_pickle_with_json, test/distributed/elastic/test_control_plane.py::WorkerServerTest::test_dump_nccl_trace_pickle_with_params, test/distributed/elastic/test_control_plane.py::WorkerServerTest::test_dump_traceback, test/distributed/elastic/test_control_plane.py::WorkerServerTest::test_get_handler_names, test/distributed/elastic/test_control_plane.py::WorkerServerTest::test_get_handler_nonexistant, test/distributed/elastic/test_control_plane.py::WorkerServerTest::test_run_handler, test/distributed/elastic/test_control_plane.py::WorkerServerTest::test_tcp, test/distributed/elastic/test_control_plane.py::WorkerServerTest::test_wait_counter_values, test/distributed/elastic/test_control_plane.py::WorkerServerTest::test_worker_server
2025-12-04T11:28:03.3005204Z 
2025-12-04T11:28:03.3005498Z Finished distributed/elastic/test_control_plane 1/1 ... [2025-12-04 11:28:03.299403][5225724.278439329], took 0.04min
2025-12-04T11:28:03.3009310Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T11:28:03.3021620Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T11:28:03.3024261Z Running distributed/fsdp/test_fsdp_overlap 1/1 ... [2025-12-04 11:28:03.302331][5225724.281372351]
2025-12-04T11:28:03.3024556Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T11:28:03.3026747Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/fsdp/test_fsdp_overlap.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:28:03.302540]
2025-12-04T11:29:17.7192518Z 
2025-12-04T11:29:17.7193472Z PRINTING LOG FILE of distributed/fsdp/test_fsdp_overlap 1/1 (test/test-reports/distributed.fsdp.test_fsdp_overlap_1.1_576f9da47548da7f_.log)
2025-12-04T11:29:17.7194846Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_overlap/distributed.fsdp.test_fsdp_overlap-c29609b993a3a584.xml
2025-12-04T11:29:17.7195768Z ============================= test session starts ==============================
2025-12-04T11:29:17.7196687Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:29:17.7197003Z cachedir: .pytest_cache
2025-12-04T11:29:17.7197427Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:29:17.7197887Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T11:29:17.7198100Z configfile: pytest.ini
2025-12-04T11:29:17.7198478Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:29:17.7198875Z collecting ... collected 1 item
2025-12-04T11:29:17.7199103Z stepcurrent: Cannot find last run test, not skipping
2025-12-04T11:29:17.7209413Z Running 1 items in this shard: test/distributed/fsdp/test_fsdp_overlap.py::TestForwardOverlapWorldSizeOneCUDA::test_forward_overlap_cuda
2025-12-04T11:29:17.7209812Z 
2025-12-04T11:29:17.7210243Z distributed/fsdp/test_fsdp_overlap.py::TestForwardOverlapWorldSizeOneCUDA::test_forward_overlap_cuda I1204 11:28:05.059000 127756 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 127825
2025-12-04T11:29:17.7211062Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T11:29:17.7211511Z   _init_core_state(
2025-12-04T11:29:17.7212291Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T11:29:17.7213249Z   _warn_cpu_init()
2025-12-04T11:29:17.7213499Z [rank0]:E1204 11:28:25.353000 127825 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:29:17.7213915Z [rank0]:E1204 11:28:25.353000 127825 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:29:17.7214511Z [rank0]:E1204 11:28:25.353000 127825 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:29:17.7215093Z [rank0]:E1204 11:28:25.353000 127825 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:29:17.7215669Z [rank0]:E1204 11:28:25.353000 127825 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:29:17.7216210Z [rank0]:E1204 11:28:25.353000 127825 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:29:17.7216655Z [rank0]:E1204 11:28:25.353000 127825 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:29:17.7217118Z [rank0]:E1204 11:28:25.353000 127825 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:29:17.7217582Z [rank0]:E1204 11:28:25.353000 127825 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:29:17.7218047Z [rank0]:E1204 11:28:25.353000 127825 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:29:17.7218542Z [rank0]:E1204 11:28:25.353000 127825 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:29:17.7218991Z [rank0]:E1204 11:28:25.353000 127825 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:29:17.7219445Z [rank0]:E1204 11:28:25.353000 127825 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:29:17.7219946Z [rank0]:E1204 11:28:25.353000 127825 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:29:17.7220636Z [rank0]:E1204 11:28:25.353000 127825 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 0. CUDA driver allocated memory was 1633681408 and is now 1669332992.
2025-12-04T11:29:17.7221270Z [rank0]:E1204 11:28:25.353000 127825 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:29:17.7221620Z [rank0]:E1204 11:28:25.353000 127825 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:29:17.7222217Z [rank0]:E1204 11:28:25.353000 127825 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_overlap.py TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda
2025-12-04T11:29:17.7222762Z [rank0]:E1204 11:28:25.353000 127825 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:29:17.7223128Z [rank0]:E1204 11:28:25.353000 127825 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:29:17.7223546Z [rank0]:E1204 11:28:25.353000 127825 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T11:29:17.7223789Z dist init r=0, world=1
2025-12-04T11:29:17.7223857Z 
2025-12-04T11:29:17.7223892Z rank0:
2025-12-04T11:29:17.7224101Z   e1: {'cpu_iter': 0.0008449301000000631, 'cpu_wait': 1.7167000000206656e-05, 'gpu_compute': 0.017279599979519843, 'gpu_total': 0.3514355033636093}
2025-12-04T11:29:17.7224439Z   e2: {'cpu_iter': 0.0019283733999998277, 'cpu_wait': 1.8078999999993072e-05, 'gpu_compute': 0.03761139996349812, 'gpu_total': 0.7945138156414032}
2025-12-04T11:29:17.7224765Z   e3: {'cpu_iter': 0.0015969616999997882, 'cpu_wait': 0.39530624519999974, 'gpu_compute': 397.0044761657715, 'gpu_total': 397.34800109863284}
2025-12-04T11:29:17.7225078Z   e4: {'cpu_iter': 0.0036101315999992776, 'cpu_wait': 0.7498134582999999, 'gpu_compute': 397.0139923095703, 'gpu_total': 397.546044921875}
2025-12-04T11:29:17.7225613Z [rank0]:[W1204 11:28:25.605091950 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T11:29:17.7226032Z FAILED [21.7298s] [100%]
2025-12-04T11:29:17.7226102Z 
2025-12-04T11:29:17.7226161Z =================================== FAILURES ===================================
2025-12-04T11:29:17.7226363Z _________ TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda _________
2025-12-04T11:29:17.7226552Z Traceback (most recent call last):
2025-12-04T11:29:17.7226807Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T11:29:17.7227055Z     self._join_processes(fn)
2025-12-04T11:29:17.7227303Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T11:29:17.7227604Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T11:29:17.7227874Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T11:29:17.7228133Z     raise RuntimeError(error)
2025-12-04T11:29:17.7228289Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T11:29:17.7228452Z Traceback (most recent call last):
2025-12-04T11:29:17.7228693Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:29:17.7228937Z     getattr(self, test_name)()
2025-12-04T11:29:17.7229170Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:29:17.7229403Z     fn()
2025-12-04T11:29:17.7229660Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:29:17.7229897Z     method(*args, **kwargs)
2025-12-04T11:29:17.7230120Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:29:17.7230351Z     method(*args, **kwargs)
2025-12-04T11:29:17.7230569Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:29:17.7230797Z     with policy():
2025-12-04T11:29:17.7231010Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:29:17.7231278Z     raise RuntimeError(msg)
2025-12-04T11:29:17.7231692Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 0. CUDA driver allocated memory was 1633681408 and is now 1669332992.
2025-12-04T11:29:17.7232067Z 
2025-12-04T11:29:17.7232144Z To execute this test, run the following from the base repo dir:
2025-12-04T11:29:17.7232488Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_overlap.py TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda
2025-12-04T11:29:17.7232752Z 
2025-12-04T11:29:17.7232846Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:29:17.7232973Z 
2025-12-04T11:29:17.7232974Z 
2025-12-04T11:29:17.7233057Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:29:17.7233263Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T11:29:17.7233632Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_overlap/distributed.fsdp.test_fsdp_overlap-c29609b993a3a584.xml -
2025-12-04T11:29:17.7233974Z =========================== short test summary info ============================
2025-12-04T11:29:17.7234327Z FAILED [21.7298s] distributed/fsdp/test_fsdp_overlap.py::TestForwardOverlapWorldSizeOneCUDA::test_forward_overlap_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T11:29:17.7234656Z Traceback (most recent call last):
2025-12-04T11:29:17.7234902Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:29:17.7235148Z     getattr(self, test_name)()
2025-12-04T11:29:17.7235384Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:29:17.7235624Z     fn()
2025-12-04T11:29:17.7235826Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:29:17.7236059Z     method(*args, **kwargs)
2025-12-04T11:29:17.7236280Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:29:17.7236561Z     method(*args, **kwargs)
2025-12-04T11:29:17.7236779Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:29:17.7237007Z     with policy():
2025-12-04T11:29:17.7237219Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:29:17.7237450Z     raise RuntimeError(msg)
2025-12-04T11:29:17.7237867Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 0. CUDA driver allocated memory was 1633681408 and is now 1669332992.
2025-12-04T11:29:17.7238243Z 
2025-12-04T11:29:17.7238320Z To execute this test, run the following from the base repo dir:
2025-12-04T11:29:17.7238658Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_overlap.py TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda
2025-12-04T11:29:17.7238920Z 
2025-12-04T11:29:17.7239008Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:29:17.7239198Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:29:17.7239359Z ============================== 1 failed in 21.89s ==============================
2025-12-04T11:29:17.7239493Z Got exit code 1
2025-12-04T11:29:17.7239662Z Retrying single test...
2025-12-04T11:29:17.7239931Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_overlap/distributed.fsdp.test_fsdp_overlap-7ccc9be6dff28e59.xml
2025-12-04T11:29:17.7240222Z ============================= test session starts ==============================
2025-12-04T11:29:17.7240436Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:29:17.7240624Z cachedir: .pytest_cache
2025-12-04T11:29:17.7240849Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:29:17.7241086Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T11:29:17.7241204Z configfile: pytest.ini
2025-12-04T11:29:17.7241432Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:29:17.7241674Z collecting ... collected 1 item
2025-12-04T11:29:17.7241967Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_fsdp_overlap.py::TestForwardOverlapWorldSizeOneCUDA::test_forward_overlap_cuda
2025-12-04T11:29:17.7242261Z Running 1 items in this shard
2025-12-04T11:29:17.7242332Z 
2025-12-04T11:29:17.7242642Z distributed/fsdp/test_fsdp_overlap.py::TestForwardOverlapWorldSizeOneCUDA::test_forward_overlap_cuda I1204 11:28:29.156000 127908 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 127977
2025-12-04T11:29:17.7243282Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T11:29:17.7243647Z   _init_core_state(
2025-12-04T11:29:17.7244292Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T11:29:17.7244939Z   _warn_cpu_init()
2025-12-04T11:29:17.7245279Z [rank0]:E1204 11:28:49.438000 127977 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:29:17.7245665Z [rank0]:E1204 11:28:49.438000 127977 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:29:17.7246152Z [rank0]:E1204 11:28:49.438000 127977 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:29:17.7246632Z [rank0]:E1204 11:28:49.438000 127977 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:29:17.7247108Z [rank0]:E1204 11:28:49.438000 127977 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:29:17.7247554Z [rank0]:E1204 11:28:49.438000 127977 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:29:17.7248049Z [rank0]:E1204 11:28:49.438000 127977 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:29:17.7248512Z [rank0]:E1204 11:28:49.438000 127977 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:29:17.7248973Z [rank0]:E1204 11:28:49.438000 127977 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:29:17.7249469Z [rank0]:E1204 11:28:49.438000 127977 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:29:17.7249973Z [rank0]:E1204 11:28:49.438000 127977 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:29:17.7250427Z [rank0]:E1204 11:28:49.438000 127977 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:29:17.7250880Z [rank0]:E1204 11:28:49.438000 127977 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:29:17.7251344Z [rank0]:E1204 11:28:49.438000 127977 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:29:17.7252008Z [rank0]:E1204 11:28:49.438000 127977 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 0. CUDA driver allocated memory was 1633681408 and is now 1669332992.
2025-12-04T11:29:17.7252630Z [rank0]:E1204 11:28:49.438000 127977 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:29:17.7252976Z [rank0]:E1204 11:28:49.438000 127977 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:29:17.7253561Z [rank0]:E1204 11:28:49.438000 127977 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_overlap.py TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda
2025-12-04T11:29:17.7254062Z [rank0]:E1204 11:28:49.438000 127977 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:29:17.7254421Z [rank0]:E1204 11:28:49.438000 127977 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:29:17.7254863Z [rank0]:E1204 11:28:49.438000 127977 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T11:29:17.7255102Z dist init r=0, world=1
2025-12-04T11:29:17.7255164Z 
2025-12-04T11:29:17.7255201Z rank0:
2025-12-04T11:29:17.7255405Z   e1: {'cpu_iter': 0.0008104815999999459, 'cpu_wait': 1.8289099999790892e-05, 'gpu_compute': 0.01727570001967251, 'gpu_total': 0.33282740116119386}
2025-12-04T11:29:17.7255736Z   e2: {'cpu_iter': 0.0019234882000001008, 'cpu_wait': 1.888799999996138e-05, 'gpu_compute': 0.039115499798208477, 'gpu_total': 0.7730338037014007}
2025-12-04T11:29:17.7256057Z   e3: {'cpu_iter': 0.001446042500000111, 'cpu_wait': 0.39556908509999966, 'gpu_compute': 397.13989410400393, 'gpu_total': 397.4113311767578}
2025-12-04T11:29:17.7256370Z   e4: {'cpu_iter': 0.0032382339999998066, 'cpu_wait': 0.7502928507999993, 'gpu_compute': 397.148299407959, 'gpu_total': 397.53850708007815}
2025-12-04T11:29:17.7256888Z [rank0]:[W1204 11:28:49.605995528 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T11:29:17.7257298Z FAILED [21.6340s] [100%]
2025-12-04T11:29:17.7257364Z 
2025-12-04T11:29:17.7257422Z =================================== FAILURES ===================================
2025-12-04T11:29:17.7257618Z _________ TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda _________
2025-12-04T11:29:17.7257799Z Traceback (most recent call last):
2025-12-04T11:29:17.7258079Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T11:29:17.7258322Z     self._join_processes(fn)
2025-12-04T11:29:17.7258566Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T11:29:17.7258828Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T11:29:17.7259095Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T11:29:17.7259353Z     raise RuntimeError(error)
2025-12-04T11:29:17.7259502Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T11:29:17.7259705Z Traceback (most recent call last):
2025-12-04T11:29:17.7259944Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:29:17.7260197Z     getattr(self, test_name)()
2025-12-04T11:29:17.7260428Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:29:17.7260660Z     fn()
2025-12-04T11:29:17.7260863Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:29:17.7261093Z     method(*args, **kwargs)
2025-12-04T11:29:17.7261315Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:29:17.7261544Z     method(*args, **kwargs)
2025-12-04T11:29:17.7261761Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:29:17.7261990Z     with policy():
2025-12-04T11:29:17.7262202Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:29:17.7262435Z     raise RuntimeError(msg)
2025-12-04T11:29:17.7262847Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 0. CUDA driver allocated memory was 1633681408 and is now 1669332992.
2025-12-04T11:29:17.7263224Z 
2025-12-04T11:29:17.7263298Z To execute this test, run the following from the base repo dir:
2025-12-04T11:29:17.7263688Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_overlap.py TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda
2025-12-04T11:29:17.7263947Z 
2025-12-04T11:29:17.7264032Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:29:17.7264158Z 
2025-12-04T11:29:17.7264160Z 
2025-12-04T11:29:17.7264235Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:29:17.7264435Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T11:29:17.7264804Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_overlap/distributed.fsdp.test_fsdp_overlap-7ccc9be6dff28e59.xml -
2025-12-04T11:29:17.7265145Z =========================== short test summary info ============================
2025-12-04T11:29:17.7265491Z FAILED [21.6340s] distributed/fsdp/test_fsdp_overlap.py::TestForwardOverlapWorldSizeOneCUDA::test_forward_overlap_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T11:29:17.7265814Z Traceback (most recent call last):
2025-12-04T11:29:17.7266057Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:29:17.7266297Z     getattr(self, test_name)()
2025-12-04T11:29:17.7266529Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:29:17.7266793Z     fn()
2025-12-04T11:29:17.7266993Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:29:17.7267221Z     method(*args, **kwargs)
2025-12-04T11:29:17.7267438Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:29:17.7267664Z     method(*args, **kwargs)
2025-12-04T11:29:17.7267881Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:29:17.7268104Z     with policy():
2025-12-04T11:29:17.7268315Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:29:17.7268561Z     raise RuntimeError(msg)
2025-12-04T11:29:17.7268968Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 0. CUDA driver allocated memory was 1633681408 and is now 1669332992.
2025-12-04T11:29:17.7269352Z 
2025-12-04T11:29:17.7269425Z To execute this test, run the following from the base repo dir:
2025-12-04T11:29:17.7269796Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_overlap.py TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda
2025-12-04T11:29:17.7270059Z 
2025-12-04T11:29:17.7270148Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:29:17.7270334Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:29:17.7270489Z ============================== 1 failed in 21.79s ==============================
2025-12-04T11:29:17.7270618Z Got exit code 1
2025-12-04T11:29:17.7270713Z Retrying single test...
2025-12-04T11:29:17.7270975Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_overlap/distributed.fsdp.test_fsdp_overlap-7a5f0417d5f82ba7.xml
2025-12-04T11:29:17.7271268Z ============================= test session starts ==============================
2025-12-04T11:29:17.7271478Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:29:17.7271666Z cachedir: .pytest_cache
2025-12-04T11:29:17.7271929Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:29:17.7272167Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T11:29:17.7272283Z configfile: pytest.ini
2025-12-04T11:29:17.7272512Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:29:17.7272752Z collecting ... collected 1 item
2025-12-04T11:29:17.7273041Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_fsdp_overlap.py::TestForwardOverlapWorldSizeOneCUDA::test_forward_overlap_cuda
2025-12-04T11:29:17.7273338Z Running 1 items in this shard
2025-12-04T11:29:17.7273408Z 
2025-12-04T11:29:17.7273715Z distributed/fsdp/test_fsdp_overlap.py::TestForwardOverlapWorldSizeOneCUDA::test_forward_overlap_cuda I1204 11:28:53.304000 128060 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 128129
2025-12-04T11:29:17.7274351Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T11:29:17.7274718Z   _init_core_state(
2025-12-04T11:29:17.7275350Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T11:29:17.7276023Z   _warn_cpu_init()
2025-12-04T11:29:17.7276224Z [rank0]:E1204 11:29:13.576000 128129 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:29:17.7276566Z [rank0]:E1204 11:29:13.576000 128129 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:29:17.7277051Z [rank0]:E1204 11:29:13.576000 128129 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:29:17.7277528Z [rank0]:E1204 11:29:13.576000 128129 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:29:17.7278004Z [rank0]:E1204 11:29:13.576000 128129 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:29:17.7278451Z [rank0]:E1204 11:29:13.576000 128129 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:29:17.7278892Z [rank0]:E1204 11:29:13.576000 128129 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:29:17.7279354Z [rank0]:E1204 11:29:13.576000 128129 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:29:17.7279850Z [rank0]:E1204 11:29:13.576000 128129 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:29:17.7280312Z [rank0]:E1204 11:29:13.576000 128129 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:29:17.7280773Z [rank0]:E1204 11:29:13.576000 128129 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:29:17.7281258Z [rank0]:E1204 11:29:13.576000 128129 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:29:17.7281717Z [rank0]:E1204 11:29:13.576000 128129 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:29:17.7282178Z [rank0]:E1204 11:29:13.576000 128129 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:29:17.7282834Z [rank0]:E1204 11:29:13.576000 128129 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 0. CUDA driver allocated memory was 1633681408 and is now 1669332992.
2025-12-04T11:29:17.7283450Z [rank0]:E1204 11:29:13.576000 128129 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:29:17.7283799Z [rank0]:E1204 11:29:13.576000 128129 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:29:17.7284382Z [rank0]:E1204 11:29:13.576000 128129 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_overlap.py TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda
2025-12-04T11:29:17.7284911Z [rank0]:E1204 11:29:13.576000 128129 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:29:17.7285271Z [rank0]:E1204 11:29:13.576000 128129 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:29:17.7285687Z [rank0]:E1204 11:29:13.576000 128129 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T11:29:17.7285931Z dist init r=0, world=1
2025-12-04T11:29:17.7285994Z 
2025-12-04T11:29:17.7286030Z rank0:
2025-12-04T11:29:17.7286233Z   e1: {'cpu_iter': 0.0007750598999999525, 'cpu_wait': 2.6316000000115024e-05, 'gpu_compute': 0.017363999970257282, 'gpu_total': 0.3225758999586105}
2025-12-04T11:29:17.7286564Z   e2: {'cpu_iter': 0.0018774933000003102, 'cpu_wait': 1.899099999977949e-05, 'gpu_compute': 0.037371299928054214, 'gpu_total': 0.7534022033214569}
2025-12-04T11:29:17.7286880Z   e3: {'cpu_iter': 0.0013932615000001648, 'cpu_wait': 0.3960319146999998, 'gpu_compute': 397.6953666687012, 'gpu_total': 397.93882446289064}
2025-12-04T11:29:17.7287193Z   e4: {'cpu_iter': 0.0031471187000001065, 'cpu_wait': 0.7516728245000003, 'gpu_compute': 397.7338554382324, 'gpu_total': 398.10332946777345}
2025-12-04T11:29:17.7287716Z [rank0]:[W1204 11:29:13.757007657 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T11:29:17.7288123Z FAILED [21.6327s] [100%]
2025-12-04T11:29:17.7288187Z 
2025-12-04T11:29:17.7288244Z =================================== FAILURES ===================================
2025-12-04T11:29:17.7288437Z _________ TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda _________
2025-12-04T11:29:17.7288619Z Traceback (most recent call last):
2025-12-04T11:29:17.7288864Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T11:29:17.7289111Z     self._join_processes(fn)
2025-12-04T11:29:17.7289357Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T11:29:17.7289653Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T11:29:17.7289957Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T11:29:17.7290215Z     raise RuntimeError(error)
2025-12-04T11:29:17.7290363Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T11:29:17.7290523Z Traceback (most recent call last):
2025-12-04T11:29:17.7290760Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:29:17.7291002Z     getattr(self, test_name)()
2025-12-04T11:29:17.7291236Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:29:17.7291466Z     fn()
2025-12-04T11:29:17.7291666Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:29:17.7291895Z     method(*args, **kwargs)
2025-12-04T11:29:17.7292118Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:29:17.7292345Z     method(*args, **kwargs)
2025-12-04T11:29:17.7292561Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:29:17.7292786Z     with policy():
2025-12-04T11:29:17.7292995Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:29:17.7293224Z     raise RuntimeError(msg)
2025-12-04T11:29:17.7293666Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 0. CUDA driver allocated memory was 1633681408 and is now 1669332992.
2025-12-04T11:29:17.7294041Z 
2025-12-04T11:29:17.7294117Z To execute this test, run the following from the base repo dir:
2025-12-04T11:29:17.7294451Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_overlap.py TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda
2025-12-04T11:29:17.7294711Z 
2025-12-04T11:29:17.7294798Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:29:17.7294922Z 
2025-12-04T11:29:17.7294924Z 
2025-12-04T11:29:17.7294998Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:29:17.7295194Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T11:29:17.7295563Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_overlap/distributed.fsdp.test_fsdp_overlap-7a5f0417d5f82ba7.xml -
2025-12-04T11:29:17.7295900Z =========================== short test summary info ============================
2025-12-04T11:29:17.7296247Z FAILED [21.6327s] distributed/fsdp/test_fsdp_overlap.py::TestForwardOverlapWorldSizeOneCUDA::test_forward_overlap_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T11:29:17.7296570Z Traceback (most recent call last):
2025-12-04T11:29:17.7296812Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:29:17.7297054Z     getattr(self, test_name)()
2025-12-04T11:29:17.7297284Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:29:17.7297513Z     fn()
2025-12-04T11:29:17.7297713Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:29:17.7297940Z     method(*args, **kwargs)
2025-12-04T11:29:17.7298157Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:29:17.7298399Z     method(*args, **kwargs)
2025-12-04T11:29:17.7298643Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:29:17.7298868Z     with policy():
2025-12-04T11:29:17.7299078Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:29:17.7299308Z     raise RuntimeError(msg)
2025-12-04T11:29:17.7299756Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 0. CUDA driver allocated memory was 1633681408 and is now 1669332992.
2025-12-04T11:29:17.7300134Z 
2025-12-04T11:29:17.7300210Z To execute this test, run the following from the base repo dir:
2025-12-04T11:29:17.7300541Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_overlap.py TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda
2025-12-04T11:29:17.7300797Z 
2025-12-04T11:29:17.7300886Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:29:17.7301072Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:29:17.7301226Z ============================== 1 failed in 21.79s ==============================
2025-12-04T11:29:17.7301355Z Got exit code 1
2025-12-04T11:29:17.7301585Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_overlap.py::TestForwardOverlapWorldSizeOneCUDA::test_forward_overlap_cuda
2025-12-04T11:29:17.7301920Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T11:29:17.7302327Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_overlap/distributed.fsdp.test_fsdp_overlap-ee596d64f21d92a2.xml
2025-12-04T11:29:17.7302621Z ============================= test session starts ==============================
2025-12-04T11:29:17.7302832Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:29:17.7303023Z cachedir: .pytest_cache
2025-12-04T11:29:17.7303246Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:29:17.7303482Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T11:29:17.7303597Z configfile: pytest.ini
2025-12-04T11:29:17.7303823Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:29:17.7304094Z collecting ... collected 1 item / 1 deselected / 0 selected
2025-12-04T11:29:17.7304250Z stepcurrent: skipping 1 already run items.
2025-12-04T11:29:17.7304378Z Running 0 items in this shard
2025-12-04T11:29:17.7304450Z 
2025-12-04T11:29:17.7304694Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_overlap/distributed.fsdp.test_fsdp_overlap-ee596d64f21d92a2.xml -
2025-12-04T11:29:17.7305033Z ============================ 1 deselected in 0.00s =============================
2025-12-04T11:29:17.7305334Z The following tests failed consistently: ['test/distributed/fsdp/test_fsdp_overlap.py::TestForwardOverlapWorldSizeOneCUDA::test_forward_overlap_cuda']
2025-12-04T11:29:17.7305572Z 
2025-12-04T11:29:17.7305766Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_overlap 1/1 (test/test-reports/distributed.fsdp.test_fsdp_overlap_1.1_576f9da47548da7f_.log)
2025-12-04T11:29:17.7305994Z 
2025-12-04T11:29:17.7306121Z Finished distributed/fsdp/test_fsdp_overlap 1/1 ... [2025-12-04 11:29:17.719179][5225798.698215014], took 1.24min
2025-12-04T11:29:17.7306549Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T11:29:17.7306938Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T11:29:17.7307153Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading
2025-12-04T11:29:17.7307363Z Uploading artifacts took 0.00 seconds
2025-12-04T11:29:17.7307500Z distributed/fsdp/test_fsdp_overlap 1/1 failed!
2025-12-04T11:29:17.7307695Z Running distributed/test_functional_api 1/1 ... [2025-12-04 11:29:17.721807][5225798.7008489]
2025-12-04T11:29:17.7307885Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T11:29:17.7308281Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_functional_api.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:29:17.722015]
2025-12-04T11:31:06.3428328Z 
2025-12-04T11:31:06.3431754Z distributed/test_functional_api 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_functional_api_1.1_4cf011bcbb1b8894_.log
2025-12-04T11:31:06.3435804Z Running 11 items in this shard: test/distributed/test_functional_api.py::TestMetaCollectives::test_all_reduce, test/distributed/test_functional_api.py::TestMakeFx::test_all_reduce_tracing, test/distributed/test_functional_api.py::TestCollectivesWithDistributedBackendCUDA::test_all_gather_into_tensor_coalesced_cuda, test/distributed/test_functional_api.py::TestCollectivesWithDistributedBackendCUDA::test_all_to_all_single_1d_input_cuda, test/distributed/test_functional_api.py::TestCollectivesWithDistributedBackendCUDA::test_all_to_all_single_cuda, test/distributed/test_functional_api.py::TestCollectivesWithDistributedBackendCUDA::test_all_to_all_single_split_sizes_none_cuda, test/distributed/test_functional_api.py::TestCollectivesWithDistributedBackendCUDA::test_tracing_cuda, test/distributed/test_functional_api.py::TestCollectivesWithDistributedBackendCUDA::test_tracing_with_dce_code_cuda, test/distributed/test_functional_api.py::TestCollectivesWithDistributedBackendCUDA::test_tracing_with_fakepg_cuda, test/distributed/test_functional_api.py::TestDistributedBackendCollectivesWithWorldSize4CUDA::test_permute_tensor_with_sub_group_cuda, test/distributed/test_functional_api.py::TestFunctionalAutogradWithDistributedBackendCUDA::test_all_to_all_single_cuda
2025-12-04T11:31:06.3440221Z 
2025-12-04T11:31:06.3440463Z Finished distributed/test_functional_api 1/1 ... [2025-12-04 11:31:06.342511][5225907.321548056], took 1.81min
2025-12-04T11:31:06.3441281Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T11:31:06.3454977Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T11:31:06.3455647Z Running distributed/_composable/test_composability/test_2d_composability 1/1 ... [2025-12-04 11:31:06.345430][5225907.324471418]
2025-12-04T11:31:06.3455975Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T11:31:06.3457999Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_composable/test_composability/test_2d_composability.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:31:06.345620]
2025-12-04T11:33:37.8860707Z 
2025-12-04T11:33:37.8861819Z distributed/_composable/test_composability/test_2d_composability 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._composable.test_composability.test_2d_composability_1.1_198d128d8c5f883d_.log
2025-12-04T11:33:37.8870943Z Running 18 items in this shard: test/distributed/_composable/test_composability/test_2d_composability.py::TestFullyShard2DTraining::test_tp_with_fsdp_offloading, test/distributed/_composable/test_composability/test_2d_composability.py::TestFullyShard2DTraining::test_train_parity_2d_mlp, test/distributed/_composable/test_composability/test_2d_composability.py::TestFullyShard2DTraining::test_train_parity_2d_transformer, test/distributed/_composable/test_composability/test_2d_composability.py::TestFullyShard2DTraining::test_train_parity_2d_transformer_checkpoint_resume, test/distributed/_composable/test_composability/test_2d_composability.py::TestFullyShard2DStateDict::test_fully_shard_tp_2d_set_full_state_dict, test/distributed/_composable/test_composability/test_2d_composability.py::Test2dFSDP1ParallelIntegration::test_2d_ddp_integration_functionality, test/distributed/_composable/test_composability/test_2d_composability.py::TestNew2dParallelTraining::test_2d_e2e_training_default, test/distributed/_composable/test_composability/test_2d_composability.py::TestNew2dParallelTraining::test_2d_e2e_training_not_use_orig_params, test/distributed/_composable/test_composability/test_2d_composability.py::TestNew2dParallelTraining::test_2d_e2e_training_use_orig_params, test/distributed/_composable/test_composability/test_2d_composability.py::TestNew2dParallelTraining::test_2d_fsdp_state_enable_extension, test/distributed/_composable/test_composability/test_2d_composability.py::TestNew2dParallelStateDict::test_2d_load_state_dict_is_even_sharded_model_False, test/distributed/_composable/test_composability/test_2d_composability.py::TestNew2dParallelStateDict::test_2d_load_state_dict_is_even_sharded_model_True, test/distributed/_composable/test_composability/test_2d_composability.py::TestNew2dParallelStateDict::test_2d_optim_state_dict_is_even_sharded_model_False, test/distributed/_composable/test_composability/test_2d_composability.py::TestNew2dParallelStateDict::test_2d_optim_state_dict_is_even_sharded_model_True, test/distributed/_composable/test_composability/test_2d_composability.py::TestNew2dParallelStateDict::test_2d_state_dict_is_even_sharded_model_False, test/distributed/_composable/test_composability/test_2d_composability.py::TestNew2dParallelStateDict::test_2d_state_dict_is_even_sharded_model_True, test/distributed/_composable/test_composability/test_2d_composability.py::TestNew2dParallelStateDict::test_fsdp1_tp_2d_set_full_state_dict, test/distributed/_composable/test_composability/test_2d_composability.py::TestNew2dParallelStateDict::test_fsdp_2d_extension
2025-12-04T11:33:37.8877658Z 
2025-12-04T11:33:37.8877936Z Finished distributed/_composable/test_composability/test_2d_composability 1/1 ... [2025-12-04 11:33:37.885864][5226058.864901258], took 2.53min
2025-12-04T11:33:37.8878585Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T11:33:37.8884455Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T11:33:37.8886916Z Running distributed/fsdp/test_fsdp_optim_state 1/1 ... [2025-12-04 11:33:37.888592][5226058.867633463]
2025-12-04T11:33:37.8887198Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T11:33:37.8889096Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/fsdp/test_fsdp_optim_state.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:33:37.888791]
2025-12-04T11:41:42.2205552Z 
2025-12-04T11:41:42.2206765Z distributed/fsdp/test_fsdp_optim_state 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.fsdp.test_fsdp_optim_state_1.1_18ae9281748a32e9_.log
2025-12-04T11:41:42.2232274Z Running 60 items in this shard: test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_compatible_with_trec, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_flatten_sharded_optim_state_dict_nested, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_flatten_sharded_optim_state_dict_transformer, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_full_optim_state_dict_keys, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_full_optim_state_dict_nested_invalid, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_interface_arguments, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_no_grad, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_optim_input_warning, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_optim_state_dict_nested_state_dict_type0_use_multiple_param_groups_False_rank0_only_False_use_diff_optim_inputs_False, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_optim_state_dict_nested_state_dict_type0_use_multiple_param_groups_False_rank0_only_False_use_diff_optim_inputs_True, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_optim_state_dict_nested_state_dict_type0_use_multiple_param_groups_False_rank0_only_True_use_diff_optim_inputs_False, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_optim_state_dict_nested_state_dict_type0_use_multiple_param_groups_False_rank0_only_True_use_diff_optim_inputs_True, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_optim_state_dict_nested_state_dict_type0_use_multiple_param_groups_True_rank0_only_False_use_diff_optim_inputs_False, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_optim_state_dict_nested_state_dict_type0_use_multiple_param_groups_True_rank0_only_False_use_diff_optim_inputs_True, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_optim_state_dict_nested_state_dict_type0_use_multiple_param_groups_True_rank0_only_True_use_diff_optim_inputs_False, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_optim_state_dict_nested_state_dict_type0_use_multiple_param_groups_True_rank0_only_True_use_diff_optim_inputs_True, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_optim_state_dict_nested_state_dict_type1_use_multiple_param_groups_False_rank0_only_False_use_diff_optim_inputs_False, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_optim_state_dict_nested_state_dict_type1_use_multiple_param_groups_False_rank0_only_False_use_diff_optim_inputs_True, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_optim_state_dict_nested_state_dict_type1_use_multiple_param_groups_False_rank0_only_True_use_diff_optim_inputs_False, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_optim_state_dict_nested_state_dict_type1_use_multiple_param_groups_False_rank0_only_True_use_diff_optim_inputs_True, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_optim_state_dict_nested_state_dict_type1_use_multiple_param_groups_True_rank0_only_False_use_diff_optim_inputs_False, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_optim_state_dict_nested_state_dict_type1_use_multiple_param_groups_True_rank0_only_False_use_diff_optim_inputs_True, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_optim_state_dict_nested_state_dict_type1_use_multiple_param_groups_True_rank0_only_True_use_diff_optim_inputs_False, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_optim_state_dict_nested_state_dict_type1_use_multiple_param_groups_True_rank0_only_True_use_diff_optim_inputs_True, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_optim_state_without_param_groups, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_rekey_optim_state_dict_to_ids_state_dict_type0_use_multiple_param_groups_False, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_rekey_optim_state_dict_to_ids_state_dict_type0_use_multiple_param_groups_True, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_rekey_optim_state_dict_to_ids_state_dict_type1_use_multiple_param_groups_False, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_rekey_optim_state_dict_to_ids_state_dict_type1_use_multiple_param_groups_True, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_rekey_optim_state_dict_to_names, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_save_load_without_0th_param_state_state_dict_type0, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_save_load_without_0th_param_state_state_dict_type1, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_scatter_full_optim_state_dict_nested_halve_world_size, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_scatter_full_optim_state_dict_nested_use_multiple_param_groups_False_wrap_alt_False_use_diff_optim_inputs_False, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_scatter_full_optim_state_dict_nested_use_multiple_param_groups_False_wrap_alt_False_use_diff_optim_inputs_True, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_scatter_full_optim_state_dict_nested_use_multiple_param_groups_False_wrap_alt_True_use_diff_optim_inputs_False, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_scatter_full_optim_state_dict_nested_use_multiple_param_groups_False_wrap_alt_True_use_diff_optim_inputs_True, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_scatter_full_optim_state_dict_nested_use_multiple_param_groups_True_wrap_alt_False_use_diff_optim_inputs_False, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_scatter_full_optim_state_dict_nested_use_multiple_param_groups_True_wrap_alt_False_use_diff_optim_inputs_True, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_scatter_full_optim_state_dict_nested_use_multiple_param_groups_True_wrap_alt_True_use_diff_optim_inputs_False, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_scatter_full_optim_state_dict_nested_use_multiple_param_groups_True_wrap_alt_True_use_diff_optim_inputs_True, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_scatter_full_optim_state_dict_transformer, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_shard_full_optim_state_dict_nested_halve_world_size, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_shard_full_optim_state_dict_nested_use_multiple_param_groups_False_wrap_alt_False_use_diff_optim_inputs_False, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_shard_full_optim_state_dict_nested_use_multiple_param_groups_False_wrap_alt_False_use_diff_optim_inputs_True, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_shard_full_optim_state_dict_nested_use_multiple_param_groups_False_wrap_alt_True_use_diff_optim_inputs_False, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_shard_full_optim_state_dict_nested_use_multiple_param_groups_False_wrap_alt_True_use_diff_optim_inputs_True, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_shard_full_optim_state_dict_nested_use_multiple_param_groups_True_wrap_alt_False_use_diff_optim_inputs_False, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_shard_full_optim_state_dict_nested_use_multiple_param_groups_True_wrap_alt_False_use_diff_optim_inputs_True, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_shard_full_optim_state_dict_nested_use_multiple_param_groups_True_wrap_alt_True_use_diff_optim_inputs_False, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_shard_full_optim_state_dict_nested_use_multiple_param_groups_True_wrap_alt_True_use_diff_optim_inputs_True, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_shard_full_optim_state_dict_transformer, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_shard_full_optim_state_dict_unmanaged_params_state_dict_type0_add_to_fsdp_module_False, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_shard_full_optim_state_dict_unmanaged_params_state_dict_type0_add_to_fsdp_module_True, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_shard_full_optim_state_dict_unmanaged_params_state_dict_type1_add_to_fsdp_module_False, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_shard_full_optim_state_dict_unmanaged_params_state_dict_type1_add_to_fsdp_module_True, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_state_dict_with_none_tensor_state, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_use_orig_params, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_with_empty_optimizer_state, test/distributed/fsdp/test_fsdp_optim_state.py::TestFSDPOptimState::test_with_no_shard
2025-12-04T11:41:42.2245828Z 
2025-12-04T11:41:42.2245971Z Finished distributed/fsdp/test_fsdp_optim_state 1/1 ... [2025-12-04 11:41:42.221991][5226543.201028217], took 8.07min
2025-12-04T11:41:42.2246422Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T11:41:42.2246817Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T11:41:42.2247060Z Running distributed/tensor/test_view_ops 1/1 ... [2025-12-04 11:41:42.224203][5226543.203244639]
2025-12-04T11:41:42.2247262Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T11:41:42.2247671Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/tensor/test_view_ops.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:41:42.224416]
2025-12-04T11:47:01.3573656Z 
2025-12-04T11:47:01.3574950Z distributed/tensor/test_view_ops 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.test_view_ops_1.1_fda6ad413b66c16d_.log
2025-12-04T11:47:01.3580852Z Running 20 items in this shard: test/distributed/tensor/test_view_ops.py::TestViewOps::test_complex_view_ops, test/distributed/tensor/test_view_ops.py::TestViewOps::test_dtensor_view_op_uneven, test/distributed/tensor/test_view_ops.py::TestViewOps::test_illegal_views, test/distributed/tensor/test_view_ops.py::TestViewOps::test_squeeze_, test/distributed/tensor/test_view_ops.py::TestViewOps::test_storage_offset_shard_dim0_slice_dim1, test/distributed/tensor/test_view_ops.py::TestViewOps::test_storage_offset_shard_dim1_slice_dim0, test/distributed/tensor/test_view_ops.py::TestViewOps::test_storage_offset_slice, test/distributed/tensor/test_view_ops.py::TestViewOps::test_view_groups, test/distributed/tensor/test_view_ops.py::TestViewOps::test_view_ops, test/distributed/tensor/test_view_ops.py::TestViewOps::test_view_redistribution, test/distributed/tensor/test_view_ops.py::TestViewOpsWithLocalTensor::test_complex_view_ops, test/distributed/tensor/test_view_ops.py::TestViewOpsWithLocalTensor::test_dtensor_view_op_uneven, test/distributed/tensor/test_view_ops.py::TestViewOpsWithLocalTensor::test_illegal_views, test/distributed/tensor/test_view_ops.py::TestViewOpsWithLocalTensor::test_squeeze_, test/distributed/tensor/test_view_ops.py::TestViewOpsWithLocalTensor::test_storage_offset_shard_dim0_slice_dim1, test/distributed/tensor/test_view_ops.py::TestViewOpsWithLocalTensor::test_storage_offset_shard_dim1_slice_dim0, test/distributed/tensor/test_view_ops.py::TestViewOpsWithLocalTensor::test_storage_offset_slice, test/distributed/tensor/test_view_ops.py::TestViewOpsWithLocalTensor::test_view_groups, test/distributed/tensor/test_view_ops.py::TestViewOpsWithLocalTensor::test_view_ops, test/distributed/tensor/test_view_ops.py::TestViewOpsWithLocalTensor::test_view_redistribution
2025-12-04T11:47:01.3583484Z 
2025-12-04T11:47:01.3583621Z Finished distributed/tensor/test_view_ops 1/1 ... [2025-12-04 11:47:01.357131][5226862.336169338], took 5.32min
2025-12-04T11:47:01.3584108Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T11:47:01.3593848Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T11:47:01.3596543Z Running distributed/fsdp/test_fsdp_state_dict 2/2 ... [2025-12-04 11:47:01.359513][5226862.338554008]
2025-12-04T11:47:01.3596765Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T11:47:01.3598135Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/fsdp/test_fsdp_state_dict.py', '--shard-id=2', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:47:01.359693]
2025-12-04T11:55:04.3034348Z 
2025-12-04T11:55:04.3034931Z distributed/fsdp/test_fsdp_state_dict 2/2 was successful, full logs can be found in artifacts with path test/test-reports/distributed.fsdp.test_fsdp_state_dict_2.2_1e2e4278ab69a39a_.log
2025-12-04T11:55:04.3061746Z Running 101 items in this shard: test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload0_fp16_False_state_dict_rank0_and_offload_False_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload0_fp16_False_state_dict_rank0_and_offload_True_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload0_fp16_True_state_dict_rank0_and_offload_False_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload1_fp16_False_state_dict_rank0_and_offload_False_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload1_fp16_False_state_dict_rank0_and_offload_True_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload1_fp16_False_state_dict_rank0_and_offload_True_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload1_fp16_True_state_dict_rank0_and_offload_False_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload1_fp16_True_state_dict_rank0_and_offload_True_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload0_fp16_False_state_dict_rank0_and_offload_True_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload0_fp16_True_state_dict_rank0_and_offload_False_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload0_fp16_True_state_dict_rank0_and_offload_False_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload0_fp16_True_state_dict_rank0_and_offload_True_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload1_fp16_False_state_dict_rank0_and_offload_False_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload1_fp16_False_state_dict_rank0_and_offload_False_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload1_fp16_False_state_dict_rank0_and_offload_True_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload1_fp16_False_state_dict_rank0_and_offload_True_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload1_fp16_True_state_dict_rank0_and_offload_False_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload1_fp16_True_state_dict_rank0_and_offload_True_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload1_fp16_True_state_dict_rank0_and_offload_True_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload0_fp16_False_state_dict_rank0_and_offload_True_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload0_fp16_True_state_dict_rank0_and_offload_False_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload0_fp16_True_state_dict_rank0_and_offload_True_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload1_fp16_False_state_dict_rank0_and_offload_False_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload1_fp16_True_state_dict_rank0_and_offload_True_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload0_mixed_precision_False_state_dict_rank0_and_offload_True_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload0_mixed_precision_False_state_dict_rank0_and_offload_True_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload0_mixed_precision_True_state_dict_rank0_and_offload_False_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload0_mixed_precision_True_state_dict_rank0_and_offload_True_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload0_mixed_precision_True_state_dict_rank0_and_offload_True_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload1_mixed_precision_False_state_dict_rank0_and_offload_True_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload1_mixed_precision_False_state_dict_rank0_and_offload_True_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload1_mixed_precision_True_state_dict_rank0_and_offload_False_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload1_mixed_precision_True_state_dict_rank0_and_offload_True_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload0_mixed_precision_False_state_dict_rank0_and_offload_True_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload0_mixed_precision_False_state_dict_rank0_and_offload_True_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload0_mixed_precision_True_state_dict_rank0_and_offload_False_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload0_mixed_precision_True_state_dict_rank0_and_offload_False_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload0_mixed_precision_True_state_dict_rank0_and_offload_True_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload0_mixed_precision_True_state_dict_rank0_and_offload_True_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload1_mixed_precision_False_state_dict_rank0_and_offload_False_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload1_mixed_precision_False_state_dict_rank0_and_offload_False_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload1_mixed_precision_False_state_dict_rank0_and_offload_True_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload1_mixed_precision_False_state_dict_rank0_and_offload_True_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload1_mixed_precision_True_state_dict_rank0_and_offload_False_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload1_mixed_precision_True_state_dict_rank0_and_offload_True_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload1_mixed_precision_True_state_dict_rank0_and_offload_True_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload0_mixed_precision_False_state_dict_rank0_and_offload_False_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload0_mixed_precision_False_state_dict_rank0_and_offload_True_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload0_mixed_precision_False_state_dict_rank0_and_offload_True_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload0_mixed_precision_True_state_dict_rank0_and_offload_False_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload0_mixed_precision_True_state_dict_rank0_and_offload_True_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload1_mixed_precision_False_state_dict_rank0_and_offload_False_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload1_mixed_precision_False_state_dict_rank0_and_offload_False_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload1_mixed_precision_False_state_dict_rank0_and_offload_True_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload1_mixed_precision_True_state_dict_rank0_and_offload_False_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload1_mixed_precision_True_state_dict_rank0_and_offload_False_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload1_mixed_precision_True_state_dict_rank0_and_offload_True_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload1_mixed_precision_True_state_dict_rank0_and_offload_True_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_fsdp_state_dict_keys_state_dict_type_sharded_state_dict, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_fsdp_state_dict_with_activation_checkpoint_state_dict_type_sharded_state_dict_checkpoint_wrap_both_after_wrap_rank0_only_and_offload_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_fsdp_state_dict_with_activation_checkpoint_state_dict_type_sharded_state_dict_checkpoint_wrap_both_rank0_only_and_offload_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_fsdp_state_dict_with_activation_checkpoint_state_dict_type_sharded_state_dict_checkpoint_wrap_both_rank0_only_and_offload_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_fsdp_state_dict_with_activation_checkpoint_state_dict_type_sharded_state_dict_checkpoint_wrap_dest_rank0_only_and_offload_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_fsdp_state_dict_with_activation_checkpoint_state_dict_type_sharded_state_dict_checkpoint_wrap_source_after_wrap_rank0_only_and_offload_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_fsdp_state_dict_with_activation_checkpoint_state_dict_type_sharded_state_dict_checkpoint_wrap_source_rank0_only_and_offload_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_fsdp_state_dict_with_activation_checkpoint_state_dict_type_state_dict_checkpoint_wrap_source_after_wrap_rank0_only_and_offload_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_fsdp_state_dict_with_activation_checkpoint_state_dict_type_state_dict_checkpoint_wrap_source_rank0_only_and_offload_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_local_state_dict_with_empty_ranks, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_save_and_load_after_forward_state_dict_state_dict_type_local_state_dict_mixed_precision_True_state_dict_rank0_and_offload_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_save_and_load_after_forward_state_dict_state_dict_type_local_state_dict_mixed_precision_True_state_dict_rank0_and_offload_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_save_and_load_after_forward_state_dict_state_dict_type_sharded_state_dict_mixed_precision_False_state_dict_rank0_and_offload_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_save_and_load_after_forward_state_dict_state_dict_type_sharded_state_dict_mixed_precision_True_state_dict_rank0_and_offload_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_save_and_load_after_forward_state_dict_state_dict_type_state_dict_mixed_precision_False_state_dict_rank0_and_offload_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_save_and_load_after_forward_state_dict_state_dict_type_state_dict_mixed_precision_False_state_dict_rank0_and_offload_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_sharded_load_multi_backend_pg, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_shared_module_and_shared_parameter, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_load_into_local_module_state_dict_type_sharded_state_dict_state_dict_rank0_and_offload_False_fsdp_root_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_load_into_local_module_state_dict_type_sharded_state_dict_state_dict_rank0_and_offload_True_fsdp_root_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_load_into_local_module_state_dict_type_state_dict_state_dict_rank0_and_offload_True_fsdp_root_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_rank0_offload_save_load_flow_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_save_load_flow_state_dict_type_local_state_dict, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_save_load_flow_state_dict_type_state_dict, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_skip_module_state_dict_type_local_state_dict_double_nest_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_skip_module_state_dict_type_sharded_state_dict_double_nest_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_skip_module_state_dict_type_state_dict_double_nest_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_with_ignored_modules_state_dict_type_sharded_state_dict_prefix_False_ignore_inner_False_mixed_precision_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_with_ignored_modules_state_dict_type_sharded_state_dict_prefix_False_ignore_inner_True_mixed_precision_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_with_ignored_modules_state_dict_type_sharded_state_dict_prefix_True_ignore_inner_True_mixed_precision_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_with_ignored_modules_state_dict_type_state_dict_prefix_False_ignore_inner_False_mixed_precision_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_with_ignored_modules_state_dict_type_state_dict_prefix_False_ignore_inner_True_mixed_precision_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_with_ignored_modules_state_dict_type_state_dict_prefix_True_ignore_inner_False_mixed_precision_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_with_ignored_modules_state_dict_type_state_dict_prefix_True_ignore_inner_False_mixed_precision_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_with_ignored_modules_state_dict_type_state_dict_prefix_True_ignore_inner_True_mixed_precision_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_with_manual_ac_wrapper_state_dict_type_sharded_state_dict_rank0_only_and_offload_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_with_manual_ac_wrapper_state_dict_type_sharded_state_dict_rank0_only_and_offload_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_with_manual_ac_wrapper_state_dict_type_state_dict_rank0_only_and_offload_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_with_manual_ac_wrapper_state_dict_type_state_dict_rank0_only_and_offload_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_with_shared_parameters_state_dict_type_local_state_dict, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_with_shared_parameters_state_dict_type_sharded_state_dict, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_world_size_one, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_wrong_state_dict_config
2025-12-04T11:55:04.3087224Z 
2025-12-04T11:55:04.3087363Z Finished distributed/fsdp/test_fsdp_state_dict 2/2 ... [2025-12-04 11:55:04.303816][5227345.282855816], took 8.05min
2025-12-04T11:55:04.3087814Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T11:55:04.3088208Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T11:55:04.3088429Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading
2025-12-04T11:55:04.3088609Z Uploading artifacts took 0.00 seconds
2025-12-04T11:55:04.3088806Z Running distributed/fsdp/test_fsdp_exec_order 1/1 ... [2025-12-04 11:55:04.306031][5227345.285072508]
2025-12-04T11:55:04.3089008Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T11:55:04.3089415Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/fsdp/test_fsdp_exec_order.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:55:04.306215]
2025-12-04T11:59:23.3237598Z 
2025-12-04T11:59:23.3238478Z PRINTING LOG FILE of distributed/fsdp/test_fsdp_exec_order 1/1 (test/test-reports/distributed.fsdp.test_fsdp_exec_order_1.1_b71b860b1a78e6ee_.log)
2025-12-04T11:59:23.3239128Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-6899852671441b60.xml
2025-12-04T11:59:23.3239645Z ============================= test session starts ==============================
2025-12-04T11:59:23.3239931Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:59:23.3240184Z cachedir: .pytest_cache
2025-12-04T11:59:23.3240486Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:59:23.3240825Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T11:59:23.3241027Z configfile: pytest.ini
2025-12-04T11:59:23.3241346Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:59:23.3241701Z collecting ... collected 8 items
2025-12-04T11:59:23.3241954Z stepcurrent: Cannot find last run test, not skipping
2025-12-04T11:59:23.3243972Z Running 8 items in this shard: test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy0_cuda, test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy1_cuda, test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda, test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda, test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda, test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda, test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy0_cuda, test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy1_cuda
2025-12-04T11:59:23.3246698Z 
2025-12-04T11:59:23.3247114Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy0_cuda I1204 11:55:06.087000 205911 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 205980
2025-12-04T11:59:23.3247762Z I1204 11:55:06.088000 205911 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 205981
2025-12-04T11:59:23.3248201Z I1204 11:55:06.088000 205911 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 205982
2025-12-04T11:59:23.3248631Z I1204 11:55:06.089000 205911 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 205983
2025-12-04T11:59:23.3249544Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3250206Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3250808Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3251593Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3252191Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3252790Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3253382Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3253978Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3254224Z [rank1]:E1204 11:55:11.296000 205981 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3254586Z [rank1]:E1204 11:55:11.296000 205981 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3255112Z [rank1]:E1204 11:55:11.296000 205981 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3255617Z [rank1]:E1204 11:55:11.296000 205981 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3256128Z [rank1]:E1204 11:55:11.296000 205981 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3256593Z [rank1]:E1204 11:55:11.296000 205981 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3257090Z [rank1]:E1204 11:55:11.296000 205981 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3257668Z [rank1]:E1204 11:55:11.296000 205981 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3258197Z [rank1]:E1204 11:55:11.296000 205981 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3258690Z [rank1]:E1204 11:55:11.296000 205981 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3259174Z [rank1]:E1204 11:55:11.296000 205981 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3259689Z [rank1]:E1204 11:55:11.296000 205981 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3260149Z [rank1]:E1204 11:55:11.296000 205981 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3260628Z [rank1]:E1204 11:55:11.296000 205981 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3261461Z [rank1]:E1204 11:55:11.296000 205981 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 2317352960 and is now 3072327680.
2025-12-04T11:59:23.3262101Z [rank1]:E1204 11:55:11.296000 205981 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3262461Z [rank1]:E1204 11:55:11.296000 205981 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3263086Z [rank1]:E1204 11:55:11.296000 205981 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda
2025-12-04T11:59:23.3263614Z [rank1]:E1204 11:55:11.296000 205981 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3263992Z [rank1]:E1204 11:55:11.296000 205981 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3264817Z [rank1]:E1204 11:55:11.296000 205981 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T11:59:23.3265066Z dist init r=1, world=4
2025-12-04T11:59:23.3265273Z [rank0]:E1204 11:55:11.317000 205980 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3265610Z [rank0]:E1204 11:55:11.317000 205980 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3266099Z [rank0]:E1204 11:55:11.317000 205980 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3266585Z [rank0]:E1204 11:55:11.317000 205980 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3267113Z [rank0]:E1204 11:55:11.317000 205980 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3267566Z [rank0]:E1204 11:55:11.317000 205980 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3268008Z [rank0]:E1204 11:55:11.317000 205980 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3268475Z [rank0]:E1204 11:55:11.317000 205980 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3268947Z [rank0]:E1204 11:55:11.317000 205980 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3269409Z [rank0]:E1204 11:55:11.317000 205980 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3269930Z [rank0]:E1204 11:55:11.317000 205980 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3270380Z [rank0]:E1204 11:55:11.317000 205980 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3270838Z [rank0]:E1204 11:55:11.317000 205980 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3271338Z [rank0]:E1204 11:55:11.317000 205980 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3272014Z [rank0]:E1204 11:55:11.317000 205980 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 2459959296 and is now 3214934016.
2025-12-04T11:59:23.3272644Z [rank0]:E1204 11:55:11.317000 205980 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3272995Z [rank0]:E1204 11:55:11.317000 205980 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3273604Z [rank0]:E1204 11:55:11.317000 205980 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda
2025-12-04T11:59:23.3274125Z [rank0]:E1204 11:55:11.317000 205980 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3274494Z [rank0]:E1204 11:55:11.317000 205980 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3274911Z [rank0]:E1204 11:55:11.317000 205980 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T11:59:23.3275153Z dist init r=0, world=4
2025-12-04T11:59:23.3275355Z [rank2]:E1204 11:55:11.343000 205982 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3275698Z [rank2]:E1204 11:55:11.343000 205982 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3276186Z [rank2]:E1204 11:55:11.343000 205982 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3276699Z [rank2]:E1204 11:55:11.343000 205982 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3277183Z [rank2]:E1204 11:55:11.343000 205982 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3277632Z [rank2]:E1204 11:55:11.343000 205982 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3278074Z [rank2]:E1204 11:55:11.343000 205982 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3278538Z [rank2]:E1204 11:55:11.343000 205982 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3279005Z [rank2]:E1204 11:55:11.343000 205982 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3279467Z [rank2]:E1204 11:55:11.343000 205982 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3280005Z [rank2]:E1204 11:55:11.343000 205982 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3280520Z [rank2]:E1204 11:55:11.343000 205982 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3280979Z [rank2]:E1204 11:55:11.343000 205982 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3281456Z [rank2]:E1204 11:55:11.343000 205982 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3282139Z [rank2]:E1204 11:55:11.343000 205982 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 2. CUDA driver allocated memory was 2300575744 and is now 3055550464.
2025-12-04T11:59:23.3282773Z [rank2]:E1204 11:55:11.343000 205982 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3283121Z [rank2]:E1204 11:55:11.343000 205982 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3283727Z [rank2]:E1204 11:55:11.343000 205982 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda
2025-12-04T11:59:23.3284247Z [rank2]:E1204 11:55:11.343000 205982 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3284612Z [rank2]:E1204 11:55:11.343000 205982 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3285030Z [rank2]:E1204 11:55:11.343000 205982 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T11:59:23.3285272Z dist init r=2, world=4
2025-12-04T11:59:23.3285476Z [rank3]:E1204 11:55:11.386000 205983 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3285816Z [rank3]:E1204 11:55:11.386000 205983 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3286351Z [rank3]:E1204 11:55:11.386000 205983 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3286832Z [rank3]:E1204 11:55:11.386000 205983 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3287312Z [rank3]:E1204 11:55:11.386000 205983 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3287762Z [rank3]:E1204 11:55:11.386000 205983 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3288204Z [rank3]:E1204 11:55:11.386000 205983 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3288672Z [rank3]:E1204 11:55:11.386000 205983 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3289138Z [rank3]:E1204 11:55:11.386000 205983 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3289676Z [rank3]:E1204 11:55:11.386000 205983 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3290142Z [rank3]:E1204 11:55:11.386000 205983 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3290599Z [rank3]:E1204 11:55:11.386000 205983 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3291054Z [rank3]:E1204 11:55:11.386000 205983 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3291520Z [rank3]:E1204 11:55:11.386000 205983 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3292194Z [rank3]:E1204 11:55:11.386000 205983 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 3. CUDA driver allocated memory was 2250244096 and is now 3005218816.
2025-12-04T11:59:23.3292828Z [rank3]:E1204 11:55:11.386000 205983 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3293179Z [rank3]:E1204 11:55:11.386000 205983 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3293788Z [rank3]:E1204 11:55:11.386000 205983 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda
2025-12-04T11:59:23.3294309Z [rank3]:E1204 11:55:11.386000 205983 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3294678Z [rank3]:E1204 11:55:11.386000 205983 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3295093Z [rank3]:E1204 11:55:11.386000 205983 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T11:59:23.3295363Z dist init r=3, world=4
2025-12-04T11:59:23.3295471Z FAILED [6.1141s] [ 12%]
2025-12-04T11:59:23.3295536Z 
2025-12-04T11:59:23.3295598Z =================================== FAILURES ===================================
2025-12-04T11:59:23.3295801Z _ TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda __
2025-12-04T11:59:23.3295989Z Traceback (most recent call last):
2025-12-04T11:59:23.3296238Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T11:59:23.3296489Z     self._join_processes(fn)
2025-12-04T11:59:23.3296738Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T11:59:23.3297006Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T11:59:23.3297276Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T11:59:23.3297541Z     raise RuntimeError(error)
2025-12-04T11:59:23.3297694Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T11:59:23.3297860Z Traceback (most recent call last):
2025-12-04T11:59:23.3298102Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3298347Z     getattr(self, test_name)()
2025-12-04T11:59:23.3298579Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3298845Z     fn()
2025-12-04T11:59:23.3299051Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3299286Z     method(*args, **kwargs)
2025-12-04T11:59:23.3299510Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3299781Z     method(*args, **kwargs)
2025-12-04T11:59:23.3300004Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3300235Z     with policy():
2025-12-04T11:59:23.3300451Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3300686Z     raise RuntimeError(msg)
2025-12-04T11:59:23.3301117Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 2317352960 and is now 3072327680.
2025-12-04T11:59:23.3301516Z 
2025-12-04T11:59:23.3301595Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3301958Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda
2025-12-04T11:59:23.3302240Z 
2025-12-04T11:59:23.3302332Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3302459Z 
2025-12-04T11:59:23.3302461Z 
2025-12-04T11:59:23.3302543Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:59:23.3302745Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T11:59:23.3303123Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-6899852671441b60.xml -
2025-12-04T11:59:23.3303469Z =========================== short test summary info ============================
2025-12-04T11:59:23.3303874Z FAILED [6.1141s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy0_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T11:59:23.3304216Z Traceback (most recent call last):
2025-12-04T11:59:23.3304460Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3304708Z     getattr(self, test_name)()
2025-12-04T11:59:23.3304943Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3305178Z     fn()
2025-12-04T11:59:23.3305382Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3305614Z     method(*args, **kwargs)
2025-12-04T11:59:23.3305834Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3306064Z     method(*args, **kwargs)
2025-12-04T11:59:23.3306287Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3306518Z     with policy():
2025-12-04T11:59:23.3306731Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3306965Z     raise RuntimeError(msg)
2025-12-04T11:59:23.3307390Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 2317352960 and is now 3072327680.
2025-12-04T11:59:23.3307815Z 
2025-12-04T11:59:23.3307889Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3308247Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda
2025-12-04T11:59:23.3308528Z 
2025-12-04T11:59:23.3308618Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3308809Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:59:23.3308969Z ============================== 1 failed in 6.12s ===============================
2025-12-04T11:59:23.3309102Z Got exit code 1
2025-12-04T11:59:23.3309200Z Retrying single test...
2025-12-04T11:59:23.3309475Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-ffa467cb0a1c4f9d.xml
2025-12-04T11:59:23.3309814Z ============================= test session starts ==============================
2025-12-04T11:59:23.3310030Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:59:23.3310221Z cachedir: .pytest_cache
2025-12-04T11:59:23.3310446Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:59:23.3310688Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T11:59:23.3310809Z configfile: pytest.ini
2025-12-04T11:59:23.3311039Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:59:23.3311308Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T11:59:23.3311652Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy0_cuda
2025-12-04T11:59:23.3311970Z Running 1 items in this shard
2025-12-04T11:59:23.3312046Z 
2025-12-04T11:59:23.3312370Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy0_cuda I1204 11:55:14.889000 206289 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 206358
2025-12-04T11:59:23.3312928Z I1204 11:55:14.890000 206289 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 206359
2025-12-04T11:59:23.3313275Z I1204 11:55:14.890000 206289 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 206360
2025-12-04T11:59:23.3313618Z I1204 11:55:14.891000 206289 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 206361
2025-12-04T11:59:23.3314307Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3314899Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3315481Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3316064Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3316646Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3317261Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3317842Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3318427Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3318667Z [rank2]:E1204 11:55:20.153000 206360 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3319013Z [rank2]:E1204 11:55:20.153000 206360 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3319505Z [rank2]:E1204 11:55:20.153000 206360 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3320033Z [rank2]:E1204 11:55:20.153000 206360 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3320514Z [rank2]:E1204 11:55:20.153000 206360 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3320963Z [rank2]:E1204 11:55:20.153000 206360 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3321413Z [rank2]:E1204 11:55:20.153000 206360 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3321877Z [rank2]:E1204 11:55:20.153000 206360 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3322367Z [rank2]:E1204 11:55:20.153000 206360 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3322835Z [rank2]:E1204 11:55:20.153000 206360 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3323299Z [rank2]:E1204 11:55:20.153000 206360 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3323753Z [rank2]:E1204 11:55:20.153000 206360 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3324208Z [rank2]:E1204 11:55:20.153000 206360 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3324674Z [rank2]:E1204 11:55:20.153000 206360 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3325355Z [rank2]:E1204 11:55:20.153000 206360 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 2. CUDA driver allocated memory was 2300575744 and is now 3055550464.
2025-12-04T11:59:23.3326021Z [rank2]:E1204 11:55:20.153000 206360 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3326372Z [rank2]:E1204 11:55:20.153000 206360 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3326981Z [rank2]:E1204 11:55:20.153000 206360 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda
2025-12-04T11:59:23.3327513Z [rank2]:E1204 11:55:20.153000 206360 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3327884Z [rank2]:E1204 11:55:20.153000 206360 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3328304Z [rank2]:E1204 11:55:20.153000 206360 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T11:59:23.3328547Z dist init r=2, world=4
2025-12-04T11:59:23.3328750Z [rank1]:E1204 11:55:20.156000 206359 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3329088Z [rank1]:E1204 11:55:20.156000 206359 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3329613Z [rank1]:E1204 11:55:20.156000 206359 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3330092Z [rank1]:E1204 11:55:20.156000 206359 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3330571Z [rank1]:E1204 11:55:20.156000 206359 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3331023Z [rank1]:E1204 11:55:20.156000 206359 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3331493Z [rank1]:E1204 11:55:20.156000 206359 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3331964Z [rank1]:E1204 11:55:20.156000 206359 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3332429Z [rank1]:E1204 11:55:20.156000 206359 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3332894Z [rank1]:E1204 11:55:20.156000 206359 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3333358Z [rank1]:E1204 11:55:20.156000 206359 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3333814Z [rank1]:E1204 11:55:20.156000 206359 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3334276Z [rank1]:E1204 11:55:20.156000 206359 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3334743Z [rank1]:E1204 11:55:20.156000 206359 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3335415Z [rank1]:E1204 11:55:20.156000 206359 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 2317352960 and is now 3072327680.
2025-12-04T11:59:23.3336076Z [rank1]:E1204 11:55:20.156000 206359 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3336427Z [rank1]:E1204 11:55:20.156000 206359 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3337031Z [rank1]:E1204 11:55:20.156000 206359 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda
2025-12-04T11:59:23.3337550Z [rank1]:E1204 11:55:20.156000 206359 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3337916Z [rank1]:E1204 11:55:20.156000 206359 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3338333Z [rank1]:E1204 11:55:20.156000 206359 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T11:59:23.3338576Z dist init r=1, world=4
2025-12-04T11:59:23.3338778Z [rank3]:E1204 11:55:20.163000 206361 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3339114Z [rank3]:E1204 11:55:20.163000 206361 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3339648Z [rank3]:E1204 11:55:20.163000 206361 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3340129Z [rank3]:E1204 11:55:20.163000 206361 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3340609Z [rank3]:E1204 11:55:20.163000 206361 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3341089Z [rank3]:E1204 11:55:20.163000 206361 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3341532Z [rank3]:E1204 11:55:20.163000 206361 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3341996Z [rank3]:E1204 11:55:20.163000 206361 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3342466Z [rank3]:E1204 11:55:20.163000 206361 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3342929Z [rank3]:E1204 11:55:20.163000 206361 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3343392Z [rank3]:E1204 11:55:20.163000 206361 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3343843Z [rank3]:E1204 11:55:20.163000 206361 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3344296Z [rank3]:E1204 11:55:20.163000 206361 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3344795Z [rank3]:E1204 11:55:20.163000 206361 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3345474Z [rank3]:E1204 11:55:20.163000 206361 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 3. CUDA driver allocated memory was 2250244096 and is now 3005218816.
2025-12-04T11:59:23.3346102Z [rank3]:E1204 11:55:20.163000 206361 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3346454Z [rank3]:E1204 11:55:20.163000 206361 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3347062Z [rank3]:E1204 11:55:20.163000 206361 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda
2025-12-04T11:59:23.3347587Z [rank3]:E1204 11:55:20.163000 206361 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3347954Z [rank3]:E1204 11:55:20.163000 206361 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3348370Z [rank3]:E1204 11:55:20.163000 206361 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T11:59:23.3348613Z dist init r=3, world=4
2025-12-04T11:59:23.3348819Z [rank0]:E1204 11:55:20.171000 206358 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3349161Z [rank0]:E1204 11:55:20.171000 206358 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3349692Z [rank0]:E1204 11:55:20.171000 206358 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3350215Z [rank0]:E1204 11:55:20.171000 206358 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3350700Z [rank0]:E1204 11:55:20.171000 206358 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3351147Z [rank0]:E1204 11:55:20.171000 206358 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3351590Z [rank0]:E1204 11:55:20.171000 206358 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3352061Z [rank0]:E1204 11:55:20.171000 206358 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3352525Z [rank0]:E1204 11:55:20.171000 206358 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3352987Z [rank0]:E1204 11:55:20.171000 206358 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3353451Z [rank0]:E1204 11:55:20.171000 206358 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3353935Z [rank0]:E1204 11:55:20.171000 206358 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3354390Z [rank0]:E1204 11:55:20.171000 206358 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3354857Z [rank0]:E1204 11:55:20.171000 206358 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3355531Z [rank0]:E1204 11:55:20.171000 206358 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 2459959296 and is now 3214934016.
2025-12-04T11:59:23.3356170Z [rank0]:E1204 11:55:20.171000 206358 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3356519Z [rank0]:E1204 11:55:20.171000 206358 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3357125Z [rank0]:E1204 11:55:20.171000 206358 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda
2025-12-04T11:59:23.3357649Z [rank0]:E1204 11:55:20.171000 206358 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3358014Z [rank0]:E1204 11:55:20.171000 206358 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3358427Z [rank0]:E1204 11:55:20.171000 206358 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T11:59:23.3358670Z dist init r=0, world=4
2025-12-04T11:59:23.3358773Z FAILED [6.2150s] [100%]
2025-12-04T11:59:23.3358837Z 
2025-12-04T11:59:23.3358897Z =================================== FAILURES ===================================
2025-12-04T11:59:23.3359098Z _ TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda __
2025-12-04T11:59:23.3359307Z Traceback (most recent call last):
2025-12-04T11:59:23.3359555Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T11:59:23.3359850Z     self._join_processes(fn)
2025-12-04T11:59:23.3360101Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T11:59:23.3360366Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T11:59:23.3360637Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T11:59:23.3360900Z     raise RuntimeError(error)
2025-12-04T11:59:23.3361053Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T11:59:23.3361218Z Traceback (most recent call last):
2025-12-04T11:59:23.3361462Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3361708Z     getattr(self, test_name)()
2025-12-04T11:59:23.3361944Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3362179Z     fn()
2025-12-04T11:59:23.3362384Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3362617Z     method(*args, **kwargs)
2025-12-04T11:59:23.3362881Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3363112Z     method(*args, **kwargs)
2025-12-04T11:59:23.3363333Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3363563Z     with policy():
2025-12-04T11:59:23.3363781Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3364015Z     raise RuntimeError(msg)
2025-12-04T11:59:23.3364445Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 2317352960 and is now 3072327680.
2025-12-04T11:59:23.3364838Z 
2025-12-04T11:59:23.3364917Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3365287Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda
2025-12-04T11:59:23.3365567Z 
2025-12-04T11:59:23.3365658Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3365782Z 
2025-12-04T11:59:23.3365784Z 
2025-12-04T11:59:23.3365866Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:59:23.3366067Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T11:59:23.3366448Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-ffa467cb0a1c4f9d.xml -
2025-12-04T11:59:23.3366797Z =========================== short test summary info ============================
2025-12-04T11:59:23.3367159Z FAILED [6.2150s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy0_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T11:59:23.3367503Z Traceback (most recent call last):
2025-12-04T11:59:23.3367749Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3367996Z     getattr(self, test_name)()
2025-12-04T11:59:23.3368266Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3368505Z     fn()
2025-12-04T11:59:23.3368709Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3368943Z     method(*args, **kwargs)
2025-12-04T11:59:23.3369165Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3369400Z     method(*args, **kwargs)
2025-12-04T11:59:23.3369659Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3369890Z     with policy():
2025-12-04T11:59:23.3370105Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3370340Z     raise RuntimeError(msg)
2025-12-04T11:59:23.3370777Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 2317352960 and is now 3072327680.
2025-12-04T11:59:23.3371171Z 
2025-12-04T11:59:23.3371246Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3371601Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda
2025-12-04T11:59:23.3371915Z 
2025-12-04T11:59:23.3372003Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3372195Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:59:23.3372364Z ======================= 1 failed, 7 deselected in 6.22s ========================
2025-12-04T11:59:23.3372506Z Got exit code 1
2025-12-04T11:59:23.3372610Z Retrying single test...
2025-12-04T11:59:23.3372881Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-7749d1f1d0257bb2.xml
2025-12-04T11:59:23.3373182Z ============================= test session starts ==============================
2025-12-04T11:59:23.3373399Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:59:23.3373588Z cachedir: .pytest_cache
2025-12-04T11:59:23.3373819Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:59:23.3374063Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T11:59:23.3374184Z configfile: pytest.ini
2025-12-04T11:59:23.3374417Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:59:23.3374692Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T11:59:23.3375042Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy0_cuda
2025-12-04T11:59:23.3375361Z Running 1 items in this shard
2025-12-04T11:59:23.3375437Z 
2025-12-04T11:59:23.3375769Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy0_cuda I1204 11:55:23.461000 206667 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 206736
2025-12-04T11:59:23.3376305Z I1204 11:55:23.462000 206667 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 206737
2025-12-04T11:59:23.3376662Z I1204 11:55:23.462000 206667 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 206738
2025-12-04T11:59:23.3377043Z I1204 11:55:23.463000 206667 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 206739
2025-12-04T11:59:23.3377751Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3378351Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3378948Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3379544Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3380190Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3380788Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3381414Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3382013Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3382258Z [rank1]:E1204 11:55:28.604000 206737 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3382607Z [rank1]:E1204 11:55:28.604000 206737 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3383108Z [rank1]:E1204 11:55:28.604000 206737 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3383605Z [rank1]:E1204 11:55:28.604000 206737 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3384095Z [rank1]:E1204 11:55:28.604000 206737 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3384551Z [rank1]:E1204 11:55:28.604000 206737 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3385000Z [rank1]:E1204 11:55:28.604000 206737 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3385473Z [rank1]:E1204 11:55:28.604000 206737 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3385948Z [rank1]:E1204 11:55:28.604000 206737 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3386423Z [rank1]:E1204 11:55:28.604000 206737 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3386927Z [rank1]:E1204 11:55:28.604000 206737 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3387387Z [rank1]:E1204 11:55:28.604000 206737 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3387852Z [rank1]:E1204 11:55:28.604000 206737 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3388330Z [rank1]:E1204 11:55:28.604000 206737 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3389019Z [rank1]:E1204 11:55:28.604000 206737 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 2317352960 and is now 3072327680.
2025-12-04T11:59:23.3389704Z [rank1]:E1204 11:55:28.604000 206737 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3390065Z [rank1]:E1204 11:55:28.604000 206737 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3391872Z [rank1]:E1204 11:55:28.604000 206737 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda
2025-12-04T11:59:23.3392407Z [rank1]:E1204 11:55:28.604000 206737 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3392783Z [rank1]:E1204 11:55:28.604000 206737 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3393209Z [rank1]:E1204 11:55:28.604000 206737 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T11:59:23.3393456Z dist init r=1, world=4
2025-12-04T11:59:23.3393661Z [rank0]:E1204 11:55:28.605000 206736 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3394009Z [rank0]:E1204 11:55:28.605000 206736 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3394507Z [rank0]:E1204 11:55:28.605000 206736 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3394997Z [rank0]:E1204 11:55:28.605000 206736 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3395484Z [rank0]:E1204 11:55:28.605000 206736 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3395939Z [rank0]:E1204 11:55:28.605000 206736 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3396396Z [rank0]:E1204 11:55:28.605000 206736 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3396867Z [rank0]:E1204 11:55:28.605000 206736 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3397371Z [rank0]:E1204 11:55:28.605000 206736 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3397841Z [rank0]:E1204 11:55:28.605000 206736 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3398308Z [rank0]:E1204 11:55:28.605000 206736 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3398769Z [rank0]:E1204 11:55:28.605000 206736 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3399230Z [rank0]:E1204 11:55:28.605000 206736 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3399750Z [rank0]:E1204 11:55:28.605000 206736 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3400435Z [rank0]:E1204 11:55:28.605000 206736 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 2459959296 and is now 3214934016.
2025-12-04T11:59:23.3401109Z [rank0]:E1204 11:55:28.605000 206736 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3416117Z [rank0]:E1204 11:55:28.605000 206736 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3416778Z [rank0]:E1204 11:55:28.605000 206736 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda
2025-12-04T11:59:23.3417315Z [rank0]:E1204 11:55:28.605000 206736 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3417685Z [rank0]:E1204 11:55:28.605000 206736 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3418101Z [rank0]:E1204 11:55:28.605000 206736 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T11:59:23.3418348Z dist init r=0, world=4
2025-12-04T11:59:23.3418558Z [rank2]:E1204 11:55:28.652000 206738 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3418904Z [rank2]:E1204 11:55:28.652000 206738 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3419401Z [rank2]:E1204 11:55:28.652000 206738 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3419942Z [rank2]:E1204 11:55:28.652000 206738 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3420418Z [rank2]:E1204 11:55:28.652000 206738 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3420869Z [rank2]:E1204 11:55:28.652000 206738 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3421373Z [rank2]:E1204 11:55:28.652000 206738 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3421836Z [rank2]:E1204 11:55:28.652000 206738 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3422299Z [rank2]:E1204 11:55:28.652000 206738 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3422757Z [rank2]:E1204 11:55:28.652000 206738 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3423218Z [rank2]:E1204 11:55:28.652000 206738 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3423684Z [rank2]:E1204 11:55:28.652000 206738 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3424153Z [rank2]:E1204 11:55:28.652000 206738 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3424631Z [rank2]:E1204 11:55:28.652000 206738 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3425313Z [rank2]:E1204 11:55:28.652000 206738 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 2. CUDA driver allocated memory was 2300575744 and is now 3055550464.
2025-12-04T11:59:23.3425988Z [rank2]:E1204 11:55:28.652000 206738 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3426347Z [rank2]:E1204 11:55:28.652000 206738 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3426960Z [rank2]:E1204 11:55:28.652000 206738 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda
2025-12-04T11:59:23.3427484Z [rank2]:E1204 11:55:28.652000 206738 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3427856Z [rank2]:E1204 11:55:28.652000 206738 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3428277Z [rank2]:E1204 11:55:28.652000 206738 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T11:59:23.3428526Z dist init r=2, world=4
2025-12-04T11:59:23.3428737Z [rank3]:E1204 11:55:28.655000 206739 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3429080Z [rank3]:E1204 11:55:28.655000 206739 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3429623Z [rank3]:E1204 11:55:28.655000 206739 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3430120Z [rank3]:E1204 11:55:28.655000 206739 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3430609Z [rank3]:E1204 11:55:28.655000 206739 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3431109Z [rank3]:E1204 11:55:28.655000 206739 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3431557Z [rank3]:E1204 11:55:28.655000 206739 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3432028Z [rank3]:E1204 11:55:28.655000 206739 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3432499Z [rank3]:E1204 11:55:28.655000 206739 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3432968Z [rank3]:E1204 11:55:28.655000 206739 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3433441Z [rank3]:E1204 11:55:28.655000 206739 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3433902Z [rank3]:E1204 11:55:28.655000 206739 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3434367Z [rank3]:E1204 11:55:28.655000 206739 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3434872Z [rank3]:E1204 11:55:28.655000 206739 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3435557Z [rank3]:E1204 11:55:28.655000 206739 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 3. CUDA driver allocated memory was 2250244096 and is now 3005218816.
2025-12-04T11:59:23.3436200Z [rank3]:E1204 11:55:28.655000 206739 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3436554Z [rank3]:E1204 11:55:28.655000 206739 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3437162Z [rank3]:E1204 11:55:28.655000 206739 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda
2025-12-04T11:59:23.3437691Z [rank3]:E1204 11:55:28.655000 206739 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3438069Z [rank3]:E1204 11:55:28.655000 206739 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3438491Z [rank3]:E1204 11:55:28.655000 206739 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T11:59:23.3438742Z dist init r=3, world=4
2025-12-04T11:59:23.3438851Z FAILED [6.1151s] [100%]
2025-12-04T11:59:23.3438918Z 
2025-12-04T11:59:23.3438985Z =================================== FAILURES ===================================
2025-12-04T11:59:23.3439194Z _ TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda __
2025-12-04T11:59:23.3439388Z Traceback (most recent call last):
2025-12-04T11:59:23.3439683Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T11:59:23.3439942Z     self._join_processes(fn)
2025-12-04T11:59:23.3440228Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T11:59:23.3440509Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T11:59:23.3440787Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T11:59:23.3441057Z     raise RuntimeError(error)
2025-12-04T11:59:23.3441220Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T11:59:23.3441392Z Traceback (most recent call last):
2025-12-04T11:59:23.3441644Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3441898Z     getattr(self, test_name)()
2025-12-04T11:59:23.3442138Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3442379Z     fn()
2025-12-04T11:59:23.3442593Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3442833Z     method(*args, **kwargs)
2025-12-04T11:59:23.3443062Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3443300Z     method(*args, **kwargs)
2025-12-04T11:59:23.3443527Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3443795Z     with policy():
2025-12-04T11:59:23.3444021Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3444261Z     raise RuntimeError(msg)
2025-12-04T11:59:23.3444698Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 2459959296 and is now 3214934016.
2025-12-04T11:59:23.3445104Z 
2025-12-04T11:59:23.3445184Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3445550Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda
2025-12-04T11:59:23.3445839Z 
2025-12-04T11:59:23.3445931Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3446071Z 
2025-12-04T11:59:23.3446134Z Process 1 exited with error code 10 and exception:
2025-12-04T11:59:23.3446285Z Traceback (most recent call last):
2025-12-04T11:59:23.3446539Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3446791Z     getattr(self, test_name)()
2025-12-04T11:59:23.3447040Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3447280Z     fn()
2025-12-04T11:59:23.3447489Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3447728Z     method(*args, **kwargs)
2025-12-04T11:59:23.3447955Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3448193Z     method(*args, **kwargs)
2025-12-04T11:59:23.3448423Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3448655Z     with policy():
2025-12-04T11:59:23.3448874Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3449114Z     raise RuntimeError(msg)
2025-12-04T11:59:23.3449626Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 2317352960 and is now 3072327680.
2025-12-04T11:59:23.3450028Z 
2025-12-04T11:59:23.3450107Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3450468Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda
2025-12-04T11:59:23.3450756Z 
2025-12-04T11:59:23.3450846Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3450978Z 
2025-12-04T11:59:23.3450980Z 
2025-12-04T11:59:23.3451061Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:59:23.3451274Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T11:59:23.3451670Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-7749d1f1d0257bb2.xml -
2025-12-04T11:59:23.3452026Z =========================== short test summary info ============================
2025-12-04T11:59:23.3452401Z FAILED [6.1151s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy0_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T11:59:23.3452782Z Traceback (most recent call last):
2025-12-04T11:59:23.3453035Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3453288Z     getattr(self, test_name)()
2025-12-04T11:59:23.3453528Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3453769Z     fn()
2025-12-04T11:59:23.3453979Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3454217Z     method(*args, **kwargs)
2025-12-04T11:59:23.3454443Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3454680Z     method(*args, **kwargs)
2025-12-04T11:59:23.3454903Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3455137Z     with policy():
2025-12-04T11:59:23.3455350Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3455583Z     raise RuntimeError(msg)
2025-12-04T11:59:23.3456015Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 2459959296 and is now 3214934016.
2025-12-04T11:59:23.3456409Z 
2025-12-04T11:59:23.3456484Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3456841Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda
2025-12-04T11:59:23.3457123Z 
2025-12-04T11:59:23.3457211Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3457343Z 
2025-12-04T11:59:23.3457402Z Process 1 exited with error code 10 and exception:
2025-12-04T11:59:23.3457544Z Traceback (most recent call last):
2025-12-04T11:59:23.3457786Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3458033Z     getattr(self, test_name)()
2025-12-04T11:59:23.3458295Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3458532Z     fn()
2025-12-04T11:59:23.3458731Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3458958Z     method(*args, **kwargs)
2025-12-04T11:59:23.3459180Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3459415Z     method(*args, **kwargs)
2025-12-04T11:59:23.3459695Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3459923Z     with policy():
2025-12-04T11:59:23.3460136Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3460369Z     raise RuntimeError(msg)
2025-12-04T11:59:23.3460797Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 2317352960 and is now 3072327680.
2025-12-04T11:59:23.3461188Z 
2025-12-04T11:59:23.3461261Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3461613Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda
2025-12-04T11:59:23.3461925Z 
2025-12-04T11:59:23.3462016Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3462209Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:59:23.3462379Z ======================= 1 failed, 7 deselected in 6.13s ========================
2025-12-04T11:59:23.3462517Z Got exit code 1
2025-12-04T11:59:23.3462778Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy0_cuda
2025-12-04T11:59:23.3463135Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T11:59:23.3463508Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-a7122a8fd02eb43a.xml
2025-12-04T11:59:23.3463815Z ============================= test session starts ==============================
2025-12-04T11:59:23.3464037Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:59:23.3464231Z cachedir: .pytest_cache
2025-12-04T11:59:23.3464459Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:59:23.3464704Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T11:59:23.3464825Z configfile: pytest.ini
2025-12-04T11:59:23.3465058Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:59:23.3465332Z collecting ... collected 8 items / 1 deselected / 7 selected
2025-12-04T11:59:23.3465493Z stepcurrent: skipping 1 already run items.
2025-12-04T11:59:23.3465626Z Running 7 items in this shard
2025-12-04T11:59:23.3465698Z 
2025-12-04T11:59:23.3466028Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy1_cuda I1204 11:55:32.050000 207045 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 207114
2025-12-04T11:59:23.3466548Z I1204 11:55:32.050000 207045 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 207115
2025-12-04T11:59:23.3466894Z I1204 11:55:32.051000 207045 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 207116
2025-12-04T11:59:23.3467287Z I1204 11:55:32.051000 207045 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 207117
2025-12-04T11:59:23.3467978Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3468566Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3469155Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3469796Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3470383Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3471005Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3471591Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3472182Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3472422Z [rank3]:E1204 11:55:37.234000 207117 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3472768Z [rank3]:E1204 11:55:37.234000 207117 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3473263Z [rank3]:E1204 11:55:37.234000 207117 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3473756Z [rank3]:E1204 11:55:37.234000 207117 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3474247Z [rank3]:E1204 11:55:37.234000 207117 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3474697Z [rank3]:E1204 11:55:37.234000 207117 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3475141Z [rank3]:E1204 11:55:37.234000 207117 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3475611Z [rank3]:E1204 11:55:37.234000 207117 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3476078Z [rank3]:E1204 11:55:37.234000 207117 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3476570Z [rank3]:E1204 11:55:37.234000 207117 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3477035Z [rank3]:E1204 11:55:37.234000 207117 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3477489Z [rank3]:E1204 11:55:37.234000 207117 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3477948Z [rank3]:E1204 11:55:37.234000 207117 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3478415Z [rank3]:E1204 11:55:37.234000 207117 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3479087Z [rank3]:E1204 11:55:37.234000 207117 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 3. CUDA driver allocated memory was 2250244096 and is now 3005218816.
2025-12-04T11:59:23.3479759Z [rank3]:E1204 11:55:37.234000 207117 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3480108Z [rank3]:E1204 11:55:37.234000 207117 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3480745Z [rank3]:E1204 11:55:37.234000 207117 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda
2025-12-04T11:59:23.3481265Z [rank3]:E1204 11:55:37.234000 207117 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3481629Z [rank3]:E1204 11:55:37.234000 207117 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3482042Z [rank3]:E1204 11:55:37.234000 207117 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T11:59:23.3482284Z dist init r=3, world=4
2025-12-04T11:59:23.3482490Z [rank0]:E1204 11:55:37.289000 207114 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3482827Z [rank0]:E1204 11:55:37.289000 207114 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3483317Z [rank0]:E1204 11:55:37.289000 207114 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3483795Z [rank0]:E1204 11:55:37.289000 207114 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3484273Z [rank0]:E1204 11:55:37.289000 207114 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3484720Z [rank0]:E1204 11:55:37.289000 207114 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3485162Z [rank0]:E1204 11:55:37.289000 207114 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3485625Z [rank0]:E1204 11:55:37.289000 207114 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3486117Z [rank0]:E1204 11:55:37.289000 207114 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3486578Z [rank0]:E1204 11:55:37.289000 207114 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3487040Z [rank0]:E1204 11:55:37.289000 207114 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3487491Z [rank0]:E1204 11:55:37.289000 207114 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3487949Z [rank0]:E1204 11:55:37.289000 207114 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3488419Z [rank0]:E1204 11:55:37.289000 207114 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3489088Z [rank0]:E1204 11:55:37.289000 207114 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 2459959296 and is now 3214934016.
2025-12-04T11:59:23.3489785Z [rank0]:E1204 11:55:37.289000 207114 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3490133Z [rank0]:E1204 11:55:37.289000 207114 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3490735Z [rank0]:E1204 11:55:37.289000 207114 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda
2025-12-04T11:59:23.3491254Z [rank0]:E1204 11:55:37.289000 207114 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3491615Z [rank0]:E1204 11:55:37.289000 207114 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3492030Z [rank0]:E1204 11:55:37.289000 207114 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T11:59:23.3492271Z dist init r=0, world=4
2025-12-04T11:59:23.3492474Z [rank1]:E1204 11:55:37.292000 207115 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3492814Z [rank1]:E1204 11:55:37.292000 207115 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3493305Z [rank1]:E1204 11:55:37.292000 207115 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3493786Z [rank1]:E1204 11:55:37.292000 207115 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3494265Z [rank1]:E1204 11:55:37.292000 207115 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3494714Z [rank1]:E1204 11:55:37.292000 207115 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3495182Z [rank1]:E1204 11:55:37.292000 207115 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3495644Z [rank1]:E1204 11:55:37.292000 207115 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3496104Z [rank1]:E1204 11:55:37.292000 207115 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3496567Z [rank1]:E1204 11:55:37.292000 207115 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3497028Z [rank1]:E1204 11:55:37.292000 207115 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3497480Z [rank1]:E1204 11:55:37.292000 207115 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3497935Z [rank1]:E1204 11:55:37.292000 207115 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3498400Z [rank1]:E1204 11:55:37.292000 207115 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3499102Z [rank1]:E1204 11:55:37.292000 207115 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 2317352960 and is now 3072327680.
2025-12-04T11:59:23.3499790Z [rank1]:E1204 11:55:37.292000 207115 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3500138Z [rank1]:E1204 11:55:37.292000 207115 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3500741Z [rank1]:E1204 11:55:37.292000 207115 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda
2025-12-04T11:59:23.3501259Z [rank1]:E1204 11:55:37.292000 207115 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3501621Z [rank1]:E1204 11:55:37.292000 207115 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3502034Z [rank1]:E1204 11:55:37.292000 207115 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T11:59:23.3502274Z dist init r=1, world=4
2025-12-04T11:59:23.3502477Z [rank2]:E1204 11:55:37.299000 207116 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3502814Z [rank2]:E1204 11:55:37.299000 207116 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3503298Z [rank2]:E1204 11:55:37.299000 207116 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3503779Z [rank2]:E1204 11:55:37.299000 207116 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3504286Z [rank2]:E1204 11:55:37.299000 207116 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3504737Z [rank2]:E1204 11:55:37.299000 207116 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3505174Z [rank2]:E1204 11:55:37.299000 207116 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3505639Z [rank2]:E1204 11:55:37.299000 207116 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3506101Z [rank2]:E1204 11:55:37.299000 207116 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3506564Z [rank2]:E1204 11:55:37.299000 207116 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3507029Z [rank2]:E1204 11:55:37.299000 207116 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3507479Z [rank2]:E1204 11:55:37.299000 207116 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3507930Z [rank2]:E1204 11:55:37.299000 207116 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3508427Z [rank2]:E1204 11:55:37.299000 207116 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3509096Z [rank2]:E1204 11:55:37.299000 207116 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 2. CUDA driver allocated memory was 2300575744 and is now 3055550464.
2025-12-04T11:59:23.3509761Z [rank2]:E1204 11:55:37.299000 207116 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3510106Z [rank2]:E1204 11:55:37.299000 207116 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3510709Z [rank2]:E1204 11:55:37.299000 207116 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda
2025-12-04T11:59:23.3511230Z [rank2]:E1204 11:55:37.299000 207116 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3511592Z [rank2]:E1204 11:55:37.299000 207116 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3512007Z [rank2]:E1204 11:55:37.299000 207116 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T11:59:23.3512247Z dist init r=2, world=4
2025-12-04T11:59:23.3512348Z FAILED [6.1154s] [ 14%]
2025-12-04T11:59:23.3512414Z 
2025-12-04T11:59:23.3512476Z =================================== FAILURES ===================================
2025-12-04T11:59:23.3512674Z _ TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda __
2025-12-04T11:59:23.3512861Z Traceback (most recent call last):
2025-12-04T11:59:23.3513106Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T11:59:23.3513353Z     self._join_processes(fn)
2025-12-04T11:59:23.3513627Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T11:59:23.3513894Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T11:59:23.3514164Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T11:59:23.3514424Z     raise RuntimeError(error)
2025-12-04T11:59:23.3514576Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T11:59:23.3514739Z Traceback (most recent call last):
2025-12-04T11:59:23.3514979Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3515221Z     getattr(self, test_name)()
2025-12-04T11:59:23.3515454Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3515689Z     fn()
2025-12-04T11:59:23.3515892Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3516124Z     method(*args, **kwargs)
2025-12-04T11:59:23.3516344Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3516574Z     method(*args, **kwargs)
2025-12-04T11:59:23.3516792Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3517052Z     with policy():
2025-12-04T11:59:23.3517264Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3517498Z     raise RuntimeError(msg)
2025-12-04T11:59:23.3517925Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 3. CUDA driver allocated memory was 2250244096 and is now 3005218816.
2025-12-04T11:59:23.3518323Z 
2025-12-04T11:59:23.3518402Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3518755Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda
2025-12-04T11:59:23.3519036Z 
2025-12-04T11:59:23.3519128Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3519256Z 
2025-12-04T11:59:23.3519258Z 
2025-12-04T11:59:23.3519335Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:59:23.3519537Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T11:59:23.3519957Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-a7122a8fd02eb43a.xml -
2025-12-04T11:59:23.3520304Z =========================== short test summary info ============================
2025-12-04T11:59:23.3520664Z FAILED [6.1154s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy1_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T11:59:23.3521001Z Traceback (most recent call last):
2025-12-04T11:59:23.3521386Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3521632Z     getattr(self, test_name)()
2025-12-04T11:59:23.3521864Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3522097Z     fn()
2025-12-04T11:59:23.3522333Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3522566Z     method(*args, **kwargs)
2025-12-04T11:59:23.3522786Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3523015Z     method(*args, **kwargs)
2025-12-04T11:59:23.3523231Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3523455Z     with policy():
2025-12-04T11:59:23.3523669Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3523901Z     raise RuntimeError(msg)
2025-12-04T11:59:23.3524333Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 3. CUDA driver allocated memory was 2250244096 and is now 3005218816.
2025-12-04T11:59:23.3524727Z 
2025-12-04T11:59:23.3524802Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3525160Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda
2025-12-04T11:59:23.3525440Z 
2025-12-04T11:59:23.3525529Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3525751Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:59:23.3525915Z ======================= 1 failed, 1 deselected in 6.13s ========================
2025-12-04T11:59:23.3526053Z Got exit code 1
2025-12-04T11:59:23.3526149Z Retrying single test...
2025-12-04T11:59:23.3526419Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-8d06e9eb47bd46ec.xml
2025-12-04T11:59:23.3526727Z ============================= test session starts ==============================
2025-12-04T11:59:23.3526939Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:59:23.3527128Z cachedir: .pytest_cache
2025-12-04T11:59:23.3527352Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:59:23.3527591Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T11:59:23.3527717Z configfile: pytest.ini
2025-12-04T11:59:23.3527948Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:59:23.3528221Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T11:59:23.3528570Z stepcurrent: skipping 1 already run items. Running only test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy1_cuda
2025-12-04T11:59:23.3528886Z Running 1 items in this shard
2025-12-04T11:59:23.3528958Z 
2025-12-04T11:59:23.3529279Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy1_cuda I1204 11:55:40.737000 207423 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 207492
2025-12-04T11:59:23.3529827Z I1204 11:55:40.737000 207423 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 207493
2025-12-04T11:59:23.3530179Z I1204 11:55:40.738000 207423 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 207494
2025-12-04T11:59:23.3530518Z I1204 11:55:40.739000 207423 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 207495
2025-12-04T11:59:23.3531231Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3531820Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3532407Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3532991Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3533587Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3534165Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3534750Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3535370Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3535606Z [rank3]:E1204 11:55:45.996000 207495 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3535952Z [rank3]:E1204 11:55:45.996000 207495 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3536442Z [rank3]:E1204 11:55:45.996000 207495 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3536922Z [rank3]:E1204 11:55:45.996000 207495 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3537405Z [rank3]:E1204 11:55:45.996000 207495 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3537853Z [rank3]:E1204 11:55:45.996000 207495 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3538302Z [rank3]:E1204 11:55:45.996000 207495 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3538764Z [rank3]:E1204 11:55:45.996000 207495 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3539223Z [rank3]:E1204 11:55:45.996000 207495 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3539718Z [rank3]:E1204 11:55:45.996000 207495 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3540183Z [rank3]:E1204 11:55:45.996000 207495 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3540665Z [rank3]:E1204 11:55:45.996000 207495 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3541122Z [rank3]:E1204 11:55:45.996000 207495 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3541592Z [rank3]:E1204 11:55:45.996000 207495 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3542268Z [rank3]:E1204 11:55:45.996000 207495 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 3. CUDA driver allocated memory was 2250244096 and is now 3005218816.
2025-12-04T11:59:23.3542905Z [rank3]:E1204 11:55:45.996000 207495 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3543256Z [rank3]:E1204 11:55:45.996000 207495 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3543866Z [rank3]:E1204 11:55:45.996000 207495 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda
2025-12-04T11:59:23.3544419Z [rank3]:E1204 11:55:45.996000 207495 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3544783Z [rank3]:E1204 11:55:45.996000 207495 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3545200Z [rank3]:E1204 11:55:45.996000 207495 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T11:59:23.3545442Z dist init r=3, world=4
2025-12-04T11:59:23.3545650Z [rank0]:E1204 11:55:46.004000 207492 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3545987Z [rank0]:E1204 11:55:46.004000 207492 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3546474Z [rank0]:E1204 11:55:46.004000 207492 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3546958Z [rank0]:E1204 11:55:46.004000 207492 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3547442Z [rank0]:E1204 11:55:46.004000 207492 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3547894Z [rank0]:E1204 11:55:46.004000 207492 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3548335Z [rank0]:E1204 11:55:46.004000 207492 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3548801Z [rank0]:E1204 11:55:46.004000 207492 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3549267Z [rank0]:E1204 11:55:46.004000 207492 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3549804Z [rank0]:E1204 11:55:46.004000 207492 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3550268Z [rank0]:E1204 11:55:46.004000 207492 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3550719Z [rank0]:E1204 11:55:46.004000 207492 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3551174Z [rank0]:E1204 11:55:46.004000 207492 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3551642Z [rank0]:E1204 11:55:46.004000 207492 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3552318Z [rank0]:E1204 11:55:46.004000 207492 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 2459959296 and is now 3214934016.
2025-12-04T11:59:23.3552947Z [rank0]:E1204 11:55:46.004000 207492 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3553296Z [rank0]:E1204 11:55:46.004000 207492 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3553930Z [rank0]:E1204 11:55:46.004000 207492 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda
2025-12-04T11:59:23.3554453Z [rank0]:E1204 11:55:46.004000 207492 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3554818Z [rank0]:E1204 11:55:46.004000 207492 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3555231Z [rank0]:E1204 11:55:46.004000 207492 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T11:59:23.3555474Z dist init r=0, world=4
2025-12-04T11:59:23.3555676Z [rank1]:E1204 11:55:46.012000 207493 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3556012Z [rank1]:E1204 11:55:46.012000 207493 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3556501Z [rank1]:E1204 11:55:46.012000 207493 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3556983Z [rank1]:E1204 11:55:46.012000 207493 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3557465Z [rank1]:E1204 11:55:46.012000 207493 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3557916Z [rank1]:E1204 11:55:46.012000 207493 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3558358Z [rank1]:E1204 11:55:46.012000 207493 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3558822Z [rank1]:E1204 11:55:46.012000 207493 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3559312Z [rank1]:E1204 11:55:46.012000 207493 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3559813Z [rank1]:E1204 11:55:46.012000 207493 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3560280Z [rank1]:E1204 11:55:46.012000 207493 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3560731Z [rank1]:E1204 11:55:46.012000 207493 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3561188Z [rank1]:E1204 11:55:46.012000 207493 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3561657Z [rank1]:E1204 11:55:46.012000 207493 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3562332Z [rank1]:E1204 11:55:46.012000 207493 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 2317352960 and is now 3072327680.
2025-12-04T11:59:23.3562992Z [rank1]:E1204 11:55:46.012000 207493 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3563340Z [rank1]:E1204 11:55:46.012000 207493 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3563943Z [rank1]:E1204 11:55:46.012000 207493 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda
2025-12-04T11:59:23.3564459Z [rank1]:E1204 11:55:46.012000 207493 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3564822Z [rank1]:E1204 11:55:46.012000 207493 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3565238Z [rank1]:E1204 11:55:46.012000 207493 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T11:59:23.3565481Z dist init r=1, world=4
2025-12-04T11:59:23.3565686Z [rank2]:E1204 11:55:46.060000 207494 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3566024Z [rank2]:E1204 11:55:46.060000 207494 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3566511Z [rank2]:E1204 11:55:46.060000 207494 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3566989Z [rank2]:E1204 11:55:46.060000 207494 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3567472Z [rank2]:E1204 11:55:46.060000 207494 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3567923Z [rank2]:E1204 11:55:46.060000 207494 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3568405Z [rank2]:E1204 11:55:46.060000 207494 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3568870Z [rank2]:E1204 11:55:46.060000 207494 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3569334Z [rank2]:E1204 11:55:46.060000 207494 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3569851Z [rank2]:E1204 11:55:46.060000 207494 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3570317Z [rank2]:E1204 11:55:46.060000 207494 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3570771Z [rank2]:E1204 11:55:46.060000 207494 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3571228Z [rank2]:E1204 11:55:46.060000 207494 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3571696Z [rank2]:E1204 11:55:46.060000 207494 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3572401Z [rank2]:E1204 11:55:46.060000 207494 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 2. CUDA driver allocated memory was 2300575744 and is now 3055550464.
2025-12-04T11:59:23.3573035Z [rank2]:E1204 11:55:46.060000 207494 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3573385Z [rank2]:E1204 11:55:46.060000 207494 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3573993Z [rank2]:E1204 11:55:46.060000 207494 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda
2025-12-04T11:59:23.3574510Z [rank2]:E1204 11:55:46.060000 207494 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3574872Z [rank2]:E1204 11:55:46.060000 207494 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3575288Z [rank2]:E1204 11:55:46.060000 207494 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T11:59:23.3575530Z dist init r=2, world=4
2025-12-04T11:59:23.3575633Z FAILED [6.2146s] [100%]
2025-12-04T11:59:23.3575695Z 
2025-12-04T11:59:23.3575759Z =================================== FAILURES ===================================
2025-12-04T11:59:23.3575956Z _ TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda __
2025-12-04T11:59:23.3576144Z Traceback (most recent call last):
2025-12-04T11:59:23.3576390Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T11:59:23.3576638Z     self._join_processes(fn)
2025-12-04T11:59:23.3576888Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T11:59:23.3577158Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T11:59:23.3577458Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T11:59:23.3577721Z     raise RuntimeError(error)
2025-12-04T11:59:23.3577874Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T11:59:23.3578037Z Traceback (most recent call last):
2025-12-04T11:59:23.3578278Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3578521Z     getattr(self, test_name)()
2025-12-04T11:59:23.3578760Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3578990Z     fn()
2025-12-04T11:59:23.3579194Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3579427Z     method(*args, **kwargs)
2025-12-04T11:59:23.3579698Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3579928Z     method(*args, **kwargs)
2025-12-04T11:59:23.3580148Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3580377Z     with policy():
2025-12-04T11:59:23.3580592Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3580825Z     raise RuntimeError(msg)
2025-12-04T11:59:23.3581290Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 3. CUDA driver allocated memory was 2250244096 and is now 3005218816.
2025-12-04T11:59:23.3581679Z 
2025-12-04T11:59:23.3581758Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3582117Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda
2025-12-04T11:59:23.3582399Z 
2025-12-04T11:59:23.3582488Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3582616Z 
2025-12-04T11:59:23.3582618Z 
2025-12-04T11:59:23.3582696Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:59:23.3582901Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T11:59:23.3583284Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-8d06e9eb47bd46ec.xml -
2025-12-04T11:59:23.3583634Z =========================== short test summary info ============================
2025-12-04T11:59:23.3583994Z FAILED [6.2146s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy1_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T11:59:23.3584335Z Traceback (most recent call last):
2025-12-04T11:59:23.3584581Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3584827Z     getattr(self, test_name)()
2025-12-04T11:59:23.3585062Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3585302Z     fn()
2025-12-04T11:59:23.3585505Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3585737Z     method(*args, **kwargs)
2025-12-04T11:59:23.3585957Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3586187Z     method(*args, **kwargs)
2025-12-04T11:59:23.3586436Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3586665Z     with policy():
2025-12-04T11:59:23.3586878Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3587110Z     raise RuntimeError(msg)
2025-12-04T11:59:23.3587541Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 3. CUDA driver allocated memory was 2250244096 and is now 3005218816.
2025-12-04T11:59:23.3587934Z 
2025-12-04T11:59:23.3588010Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3588370Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda
2025-12-04T11:59:23.3588654Z 
2025-12-04T11:59:23.3588745Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3588934Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:59:23.3589100Z ======================= 1 failed, 7 deselected in 6.22s ========================
2025-12-04T11:59:23.3589240Z Got exit code 1
2025-12-04T11:59:23.3589335Z Retrying single test...
2025-12-04T11:59:23.3589650Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-d90e5df4b924c6b2.xml
2025-12-04T11:59:23.3589987Z ============================= test session starts ==============================
2025-12-04T11:59:23.3590200Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:59:23.3590393Z cachedir: .pytest_cache
2025-12-04T11:59:23.3590620Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:59:23.3590862Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T11:59:23.3590980Z configfile: pytest.ini
2025-12-04T11:59:23.3591210Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:59:23.3591487Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T11:59:23.3591833Z stepcurrent: skipping 1 already run items. Running only test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy1_cuda
2025-12-04T11:59:23.3592153Z Running 1 items in this shard
2025-12-04T11:59:23.3592226Z 
2025-12-04T11:59:23.3592550Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy1_cuda I1204 11:55:49.426000 207801 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 207870
2025-12-04T11:59:23.3593067Z I1204 11:55:49.427000 207801 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 207871
2025-12-04T11:59:23.3593417Z I1204 11:55:49.428000 207801 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 207872
2025-12-04T11:59:23.3593758Z I1204 11:55:49.428000 207801 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 207873
2025-12-04T11:59:23.3594450Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3595051Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3595666Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3596253Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3596838Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3597420Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3598011Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3598594Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3598831Z [rank3]:E1204 11:55:54.729000 207873 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3599203Z [rank3]:E1204 11:55:54.729000 207873 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3599734Z [rank3]:E1204 11:55:54.729000 207873 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3600217Z [rank3]:E1204 11:55:54.729000 207873 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3600699Z [rank3]:E1204 11:55:54.729000 207873 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3601155Z [rank3]:E1204 11:55:54.729000 207873 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3601602Z [rank3]:E1204 11:55:54.729000 207873 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3602067Z [rank3]:E1204 11:55:54.729000 207873 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3602535Z [rank3]:E1204 11:55:54.729000 207873 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3603004Z [rank3]:E1204 11:55:54.729000 207873 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3603473Z [rank3]:E1204 11:55:54.729000 207873 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3603928Z [rank3]:E1204 11:55:54.729000 207873 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3604388Z [rank3]:E1204 11:55:54.729000 207873 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3604896Z [rank3]:E1204 11:55:54.729000 207873 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3605572Z [rank3]:E1204 11:55:54.729000 207873 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 3. CUDA driver allocated memory was 2250244096 and is now 3005218816.
2025-12-04T11:59:23.3606207Z [rank3]:E1204 11:55:54.729000 207873 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3606558Z [rank3]:E1204 11:55:54.729000 207873 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3607169Z [rank3]:E1204 11:55:54.729000 207873 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda
2025-12-04T11:59:23.3607689Z [rank3]:E1204 11:55:54.729000 207873 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3608057Z [rank3]:E1204 11:55:54.729000 207873 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3608507Z [rank3]:E1204 11:55:54.729000 207873 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T11:59:23.3608753Z dist init r=3, world=4
2025-12-04T11:59:23.3608957Z [rank2]:E1204 11:55:54.733000 207872 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3609295Z [rank2]:E1204 11:55:54.733000 207872 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3609815Z [rank2]:E1204 11:55:54.733000 207872 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3610301Z [rank2]:E1204 11:55:54.733000 207872 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3610783Z [rank2]:E1204 11:55:54.733000 207872 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3611237Z [rank2]:E1204 11:55:54.733000 207872 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3611683Z [rank2]:E1204 11:55:54.733000 207872 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3612156Z [rank2]:E1204 11:55:54.733000 207872 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3612631Z [rank2]:E1204 11:55:54.733000 207872 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3613104Z [rank2]:E1204 11:55:54.733000 207872 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3613572Z [rank2]:E1204 11:55:54.733000 207872 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3614064Z [rank2]:E1204 11:55:54.733000 207872 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3614523Z [rank2]:E1204 11:55:54.733000 207872 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3614993Z [rank2]:E1204 11:55:54.733000 207872 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3615665Z [rank2]:E1204 11:55:54.733000 207872 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 2. CUDA driver allocated memory was 2300575744 and is now 3055550464.
2025-12-04T11:59:23.3616298Z [rank2]:E1204 11:55:54.733000 207872 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3616651Z [rank2]:E1204 11:55:54.733000 207872 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3617261Z [rank2]:E1204 11:55:54.733000 207872 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda
2025-12-04T11:59:23.3617818Z [rank2]:E1204 11:55:54.733000 207872 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3618181Z [rank2]:E1204 11:55:54.733000 207872 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3618596Z [rank2]:E1204 11:55:54.733000 207872 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T11:59:23.3618839Z dist init r=2, world=4
2025-12-04T11:59:23.3619040Z [rank1]:E1204 11:55:54.745000 207871 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3619380Z [rank1]:E1204 11:55:54.745000 207871 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3619904Z [rank1]:E1204 11:55:54.745000 207871 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3620391Z [rank1]:E1204 11:55:54.745000 207871 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3620878Z [rank1]:E1204 11:55:54.745000 207871 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3621328Z [rank1]:E1204 11:55:54.745000 207871 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3621771Z [rank1]:E1204 11:55:54.745000 207871 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3622237Z [rank1]:E1204 11:55:54.745000 207871 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3622707Z [rank1]:E1204 11:55:54.745000 207871 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3623171Z [rank1]:E1204 11:55:54.745000 207871 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3623670Z [rank1]:E1204 11:55:54.745000 207871 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3624127Z [rank1]:E1204 11:55:54.745000 207871 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3624584Z [rank1]:E1204 11:55:54.745000 207871 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3625056Z [rank1]:E1204 11:55:54.745000 207871 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3625735Z [rank1]:E1204 11:55:54.745000 207871 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 2317352960 and is now 3072327680.
2025-12-04T11:59:23.3626368Z [rank1]:E1204 11:55:54.745000 207871 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3626720Z [rank1]:E1204 11:55:54.745000 207871 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3627361Z [rank1]:E1204 11:55:54.745000 207871 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda
2025-12-04T11:59:23.3627873Z [rank1]:E1204 11:55:54.745000 207871 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3628237Z [rank1]:E1204 11:55:54.745000 207871 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3628648Z [rank1]:E1204 11:55:54.745000 207871 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T11:59:23.3628893Z dist init r=1, world=4
2025-12-04T11:59:23.3629090Z [rank0]:E1204 11:55:54.755000 207870 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3629433Z [rank0]:E1204 11:55:54.755000 207870 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3629950Z [rank0]:E1204 11:55:54.755000 207870 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3630428Z [rank0]:E1204 11:55:54.755000 207870 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3630904Z [rank0]:E1204 11:55:54.755000 207870 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3631351Z [rank0]:E1204 11:55:54.755000 207870 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3631795Z [rank0]:E1204 11:55:54.755000 207870 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3632259Z [rank0]:E1204 11:55:54.755000 207870 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3632749Z [rank0]:E1204 11:55:54.755000 207870 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3633211Z [rank0]:E1204 11:55:54.755000 207870 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3633671Z [rank0]:E1204 11:55:54.755000 207870 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3634126Z [rank0]:E1204 11:55:54.755000 207870 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3634577Z [rank0]:E1204 11:55:54.755000 207870 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3635046Z [rank0]:E1204 11:55:54.755000 207870 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3635717Z [rank0]:E1204 11:55:54.755000 207870 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 2459959296 and is now 3214934016.
2025-12-04T11:59:23.3636379Z [rank0]:E1204 11:55:54.755000 207870 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3636726Z [rank0]:E1204 11:55:54.755000 207870 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3637332Z [rank0]:E1204 11:55:54.755000 207870 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda
2025-12-04T11:59:23.3637846Z [rank0]:E1204 11:55:54.755000 207870 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3638202Z [rank0]:E1204 11:55:54.755000 207870 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3638613Z [rank0]:E1204 11:55:54.755000 207870 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T11:59:23.3638848Z dist init r=0, world=4
2025-12-04T11:59:23.3638945Z FAILED [6.2159s] [100%]
2025-12-04T11:59:23.3639006Z 
2025-12-04T11:59:23.3639064Z =================================== FAILURES ===================================
2025-12-04T11:59:23.3639268Z _ TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda __
2025-12-04T11:59:23.3639449Z Traceback (most recent call last):
2025-12-04T11:59:23.3639730Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T11:59:23.3639968Z     self._join_processes(fn)
2025-12-04T11:59:23.3640212Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T11:59:23.3640476Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T11:59:23.3640744Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T11:59:23.3641002Z     raise RuntimeError(error)
2025-12-04T11:59:23.3641159Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T11:59:23.3641323Z Traceback (most recent call last):
2025-12-04T11:59:23.3641592Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3641839Z     getattr(self, test_name)()
2025-12-04T11:59:23.3642073Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3642311Z     fn()
2025-12-04T11:59:23.3642515Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3642751Z     method(*args, **kwargs)
2025-12-04T11:59:23.3642973Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3643208Z     method(*args, **kwargs)
2025-12-04T11:59:23.3643429Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3643660Z     with policy():
2025-12-04T11:59:23.3643878Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3644115Z     raise RuntimeError(msg)
2025-12-04T11:59:23.3644541Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 3. CUDA driver allocated memory was 2250244096 and is now 3005218816.
2025-12-04T11:59:23.3644935Z 
2025-12-04T11:59:23.3645046Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3645408Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda
2025-12-04T11:59:23.3645692Z 
2025-12-04T11:59:23.3645782Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3645910Z 
2025-12-04T11:59:23.3645912Z 
2025-12-04T11:59:23.3645991Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:59:23.3646197Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T11:59:23.3646575Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-d90e5df4b924c6b2.xml -
2025-12-04T11:59:23.3646920Z =========================== short test summary info ============================
2025-12-04T11:59:23.3647287Z FAILED [6.2159s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy1_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T11:59:23.3647626Z Traceback (most recent call last):
2025-12-04T11:59:23.3647873Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3648120Z     getattr(self, test_name)()
2025-12-04T11:59:23.3648355Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3648592Z     fn()
2025-12-04T11:59:23.3648799Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3649034Z     method(*args, **kwargs)
2025-12-04T11:59:23.3649259Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3649494Z     method(*args, **kwargs)
2025-12-04T11:59:23.3649759Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3649990Z     with policy():
2025-12-04T11:59:23.3650207Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3650443Z     raise RuntimeError(msg)
2025-12-04T11:59:23.3650907Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 3. CUDA driver allocated memory was 2250244096 and is now 3005218816.
2025-12-04T11:59:23.3651304Z 
2025-12-04T11:59:23.3651381Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3651742Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda
2025-12-04T11:59:23.3652021Z 
2025-12-04T11:59:23.3652114Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3652308Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:59:23.3652375Z ======================= 1 failed, 7 deselected in 6.23s ========================
2025-12-04T11:59:23.3652423Z Got exit code 1
2025-12-04T11:59:23.3652617Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy1_cuda
2025-12-04T11:59:23.3652751Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T11:59:23.3652960Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-d02ca8307c9a881c.xml
2025-12-04T11:59:23.3653058Z ============================= test session starts ==============================
2025-12-04T11:59:23.3653174Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:59:23.3653220Z cachedir: .pytest_cache
2025-12-04T11:59:23.3653385Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:59:23.3653439Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T11:59:23.3653481Z configfile: pytest.ini
2025-12-04T11:59:23.3653650Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:59:23.3653724Z collecting ... collected 8 items / 2 deselected / 6 selected
2025-12-04T11:59:23.3653784Z stepcurrent: skipping 2 already run items.
2025-12-04T11:59:23.3653829Z Running 6 items in this shard
2025-12-04T11:59:23.3653831Z 
2025-12-04T11:59:23.3654193Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda I1204 11:55:58.229000 208179 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 208248
2025-12-04T11:59:23.3654355Z I1204 11:55:58.229000 208179 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 208249
2025-12-04T11:59:23.3654511Z I1204 11:55:58.230000 208179 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 208250
2025-12-04T11:59:23.3654668Z I1204 11:55:58.230000 208179 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 208251
2025-12-04T11:59:23.3655169Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3655238Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3655754Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3655818Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3656307Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3656367Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3656859Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3656915Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3657059Z [rank2]:E1204 11:56:05.254000 208250 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3657225Z [rank2]:E1204 11:56:05.254000 208250 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3657540Z [rank2]:E1204 11:56:05.254000 208250 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3657701Z [rank2]:E1204 11:56:05.254000 208250 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3657994Z [rank2]:E1204 11:56:05.254000 208250 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3658124Z [rank2]:E1204 11:56:05.254000 208250 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3658406Z [rank2]:E1204 11:56:05.254000 208250 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3658563Z [rank2]:E1204 11:56:05.254000 208250 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3658842Z [rank2]:E1204 11:56:05.254000 208250 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3658993Z [rank2]:E1204 11:56:05.254000 208250 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3659273Z [rank2]:E1204 11:56:05.254000 208250 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3659412Z [rank2]:E1204 11:56:05.254000 208250 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3659748Z [rank2]:E1204 11:56:05.254000 208250 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3659898Z [rank2]:E1204 11:56:05.254000 208250 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3660454Z [rank2]:E1204 11:56:05.254000 208250 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 2300575744 and is now 3504340992.
2025-12-04T11:59:23.3660576Z [rank2]:E1204 11:56:05.254000 208250 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3660776Z [rank2]:E1204 11:56:05.254000 208250 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3661189Z [rank2]:E1204 11:56:05.254000 208250 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda
2025-12-04T11:59:23.3661305Z [rank2]:E1204 11:56:05.254000 208250 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3661521Z [rank2]:E1204 11:56:05.254000 208250 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3661687Z [rank2]:E1204 11:56:05.254000 208250 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T11:59:23.3661767Z dist init r=2, world=4
2025-12-04T11:59:23.3661909Z [rank1]:E1204 11:56:05.258000 208249 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3662074Z [rank1]:E1204 11:56:05.258000 208249 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3662369Z [rank1]:E1204 11:56:05.258000 208249 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3662524Z [rank1]:E1204 11:56:05.258000 208249 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3662816Z [rank1]:E1204 11:56:05.258000 208249 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3662946Z [rank1]:E1204 11:56:05.258000 208249 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3663231Z [rank1]:E1204 11:56:05.258000 208249 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3663381Z [rank1]:E1204 11:56:05.258000 208249 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3663664Z [rank1]:E1204 11:56:05.258000 208249 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3663814Z [rank1]:E1204 11:56:05.258000 208249 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3664094Z [rank1]:E1204 11:56:05.258000 208249 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3664235Z [rank1]:E1204 11:56:05.258000 208249 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3664536Z [rank1]:E1204 11:56:05.258000 208249 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3664687Z [rank1]:E1204 11:56:05.258000 208249 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3665203Z [rank1]:E1204 11:56:05.258000 208249 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 2317352960 and is now 3521118208.
2025-12-04T11:59:23.3665323Z [rank1]:E1204 11:56:05.258000 208249 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3665524Z [rank1]:E1204 11:56:05.258000 208249 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3665936Z [rank1]:E1204 11:56:05.258000 208249 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda
2025-12-04T11:59:23.3666051Z [rank1]:E1204 11:56:05.258000 208249 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3666287Z [rank1]:E1204 11:56:05.258000 208249 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3666456Z [rank1]:E1204 11:56:05.258000 208249 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T11:59:23.3666495Z dist init r=1, world=4
2025-12-04T11:59:23.3666637Z [rank3]:E1204 11:56:05.275000 208251 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3666796Z [rank3]:E1204 11:56:05.275000 208251 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3672403Z [rank3]:E1204 11:56:05.275000 208251 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3672568Z [rank3]:E1204 11:56:05.275000 208251 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3672853Z [rank3]:E1204 11:56:05.275000 208251 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3672981Z [rank3]:E1204 11:56:05.275000 208251 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3673259Z [rank3]:E1204 11:56:05.275000 208251 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3673408Z [rank3]:E1204 11:56:05.275000 208251 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3673687Z [rank3]:E1204 11:56:05.275000 208251 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3673834Z [rank3]:E1204 11:56:05.275000 208251 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3674156Z [rank3]:E1204 11:56:05.275000 208251 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3674293Z [rank3]:E1204 11:56:05.275000 208251 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3674576Z [rank3]:E1204 11:56:05.275000 208251 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3674728Z [rank3]:E1204 11:56:05.275000 208251 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3675245Z [rank3]:E1204 11:56:05.275000 208251 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 2250244096 and is now 3454009344.
2025-12-04T11:59:23.3675358Z [rank3]:E1204 11:56:05.275000 208251 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3675555Z [rank3]:E1204 11:56:05.275000 208251 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3675962Z [rank3]:E1204 11:56:05.275000 208251 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda
2025-12-04T11:59:23.3676107Z [rank3]:E1204 11:56:05.275000 208251 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3676320Z [rank3]:E1204 11:56:05.275000 208251 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3676486Z [rank3]:E1204 11:56:05.275000 208251 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T11:59:23.3676526Z dist init r=3, world=4
2025-12-04T11:59:23.3676664Z [rank0]:E1204 11:56:05.336000 208248 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3676824Z [rank0]:E1204 11:56:05.336000 208248 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3677112Z [rank0]:E1204 11:56:05.336000 208248 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3677268Z [rank0]:E1204 11:56:05.336000 208248 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3677554Z [rank0]:E1204 11:56:05.336000 208248 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3677677Z [rank0]:E1204 11:56:05.336000 208248 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3677957Z [rank0]:E1204 11:56:05.336000 208248 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3678104Z [rank0]:E1204 11:56:05.336000 208248 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3678625Z [rank0]:E1204 11:56:05.336000 208248 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3678775Z [rank0]:E1204 11:56:05.336000 208248 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3679053Z [rank0]:E1204 11:56:05.336000 208248 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3679197Z [rank0]:E1204 11:56:05.336000 208248 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3679479Z [rank0]:E1204 11:56:05.336000 208248 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3679668Z [rank0]:E1204 11:56:05.336000 208248 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3680200Z [rank0]:E1204 11:56:05.336000 208248 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3072 on device 0. CUDA driver allocated memory was 2459959296 and is now 3663724544.
2025-12-04T11:59:23.3680341Z [rank0]:E1204 11:56:05.336000 208248 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3680542Z [rank0]:E1204 11:56:05.336000 208248 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3680948Z [rank0]:E1204 11:56:05.336000 208248 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda
2025-12-04T11:59:23.3681063Z [rank0]:E1204 11:56:05.336000 208248 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3681278Z [rank0]:E1204 11:56:05.336000 208248 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3681445Z [rank0]:E1204 11:56:05.336000 208248 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T11:59:23.3681487Z dist init r=0, world=4
2025-12-04T11:59:23.3681846Z [rank0]:[W1204 11:56:05.732202649 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T11:59:23.3681890Z FAILED [8.9182s] [ 16%]
2025-12-04T11:59:23.3681892Z 
2025-12-04T11:59:23.3681951Z =================================== FAILURES ===================================
2025-12-04T11:59:23.3682090Z _ TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda _
2025-12-04T11:59:23.3682137Z Traceback (most recent call last):
2025-12-04T11:59:23.3682306Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T11:59:23.3682353Z     self._join_processes(fn)
2025-12-04T11:59:23.3682533Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T11:59:23.3682590Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T11:59:23.3682811Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T11:59:23.3682856Z     raise RuntimeError(error)
2025-12-04T11:59:23.3682942Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T11:59:23.3682989Z Traceback (most recent call last):
2025-12-04T11:59:23.3683153Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3683198Z     getattr(self, test_name)()
2025-12-04T11:59:23.3683357Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3683399Z     fn()
2025-12-04T11:59:23.3683554Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3683599Z     method(*args, **kwargs)
2025-12-04T11:59:23.3683754Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3683800Z     method(*args, **kwargs)
2025-12-04T11:59:23.3683952Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3683996Z     with policy():
2025-12-04T11:59:23.3684151Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3684195Z     raise RuntimeError(msg)
2025-12-04T11:59:23.3684592Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 2317352960 and is now 3521118208.
2025-12-04T11:59:23.3684618Z 
2025-12-04T11:59:23.3684698Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3684984Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda
2025-12-04T11:59:23.3684990Z 
2025-12-04T11:59:23.3685081Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3685083Z 
2025-12-04T11:59:23.3685146Z Process 2 exited with error code 10 and exception:
2025-12-04T11:59:23.3685192Z Traceback (most recent call last):
2025-12-04T11:59:23.3685361Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3685406Z     getattr(self, test_name)()
2025-12-04T11:59:23.3685571Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3685607Z     fn()
2025-12-04T11:59:23.3685762Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3685803Z     method(*args, **kwargs)
2025-12-04T11:59:23.3685957Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3685997Z     method(*args, **kwargs)
2025-12-04T11:59:23.3686149Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3686187Z     with policy():
2025-12-04T11:59:23.3686344Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3686385Z     raise RuntimeError(msg)
2025-12-04T11:59:23.3686799Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 2300575744 and is now 3504340992.
2025-12-04T11:59:23.3686802Z 
2025-12-04T11:59:23.3686879Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3687160Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda
2025-12-04T11:59:23.3687162Z 
2025-12-04T11:59:23.3687256Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3687260Z 
2025-12-04T11:59:23.3687319Z Process 3 exited with error code 10 and exception:
2025-12-04T11:59:23.3687367Z Traceback (most recent call last):
2025-12-04T11:59:23.3687531Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3687577Z     getattr(self, test_name)()
2025-12-04T11:59:23.3687740Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3687778Z     fn()
2025-12-04T11:59:23.3687931Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3687973Z     method(*args, **kwargs)
2025-12-04T11:59:23.3688124Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3688166Z     method(*args, **kwargs)
2025-12-04T11:59:23.3688343Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3688384Z     with policy():
2025-12-04T11:59:23.3688535Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3688580Z     raise RuntimeError(msg)
2025-12-04T11:59:23.3688971Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 2250244096 and is now 3454009344.
2025-12-04T11:59:23.3688973Z 
2025-12-04T11:59:23.3689047Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3689329Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda
2025-12-04T11:59:23.3689333Z 
2025-12-04T11:59:23.3689419Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3689421Z 
2025-12-04T11:59:23.3689423Z 
2025-12-04T11:59:23.3689504Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:59:23.3689627Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T11:59:23.3689885Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-d02ca8307c9a881c.xml -
2025-12-04T11:59:23.3689946Z =========================== short test summary info ============================
2025-12-04T11:59:23.3690246Z FAILED [8.9182s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T11:59:23.3690295Z Traceback (most recent call last):
2025-12-04T11:59:23.3690462Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3690507Z     getattr(self, test_name)()
2025-12-04T11:59:23.3690670Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3690735Z     fn()
2025-12-04T11:59:23.3690891Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3690933Z     method(*args, **kwargs)
2025-12-04T11:59:23.3691084Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3691126Z     method(*args, **kwargs)
2025-12-04T11:59:23.3691277Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3691320Z     with policy():
2025-12-04T11:59:23.3691475Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3691518Z     raise RuntimeError(msg)
2025-12-04T11:59:23.3691908Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 2317352960 and is now 3521118208.
2025-12-04T11:59:23.3691910Z 
2025-12-04T11:59:23.3691987Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3692269Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda
2025-12-04T11:59:23.3692299Z 
2025-12-04T11:59:23.3692385Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3692387Z 
2025-12-04T11:59:23.3692447Z Process 2 exited with error code 10 and exception:
2025-12-04T11:59:23.3692491Z Traceback (most recent call last):
2025-12-04T11:59:23.3692655Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3692700Z     getattr(self, test_name)()
2025-12-04T11:59:23.3692863Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3692897Z     fn()
2025-12-04T11:59:23.3693052Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3693091Z     method(*args, **kwargs)
2025-12-04T11:59:23.3693246Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3693287Z     method(*args, **kwargs)
2025-12-04T11:59:23.3693440Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3693477Z     with policy():
2025-12-04T11:59:23.3693637Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3693679Z     raise RuntimeError(msg)
2025-12-04T11:59:23.3694071Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 2300575744 and is now 3504340992.
2025-12-04T11:59:23.3694074Z 
2025-12-04T11:59:23.3694149Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3694433Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda
2025-12-04T11:59:23.3694435Z 
2025-12-04T11:59:23.3694524Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3694526Z 
2025-12-04T11:59:23.3694584Z Process 3 exited with error code 10 and exception:
2025-12-04T11:59:23.3694655Z Traceback (most recent call last):
2025-12-04T11:59:23.3694819Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3694868Z     getattr(self, test_name)()
2025-12-04T11:59:23.3695028Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3695066Z     fn()
2025-12-04T11:59:23.3695219Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3695264Z     method(*args, **kwargs)
2025-12-04T11:59:23.3695415Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3695457Z     method(*args, **kwargs)
2025-12-04T11:59:23.3695609Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3695651Z     with policy():
2025-12-04T11:59:23.3695807Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3695848Z     raise RuntimeError(msg)
2025-12-04T11:59:23.3696241Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 2250244096 and is now 3454009344.
2025-12-04T11:59:23.3696264Z 
2025-12-04T11:59:23.3696338Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3696622Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda
2025-12-04T11:59:23.3696626Z 
2025-12-04T11:59:23.3696712Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3696779Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:59:23.3696843Z ======================= 1 failed, 2 deselected in 8.93s ========================
2025-12-04T11:59:23.3696883Z Got exit code 1
2025-12-04T11:59:23.3696924Z Retrying single test...
2025-12-04T11:59:23.3697135Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-d5eb9510b4a5dc42.xml
2025-12-04T11:59:23.3697197Z ============================= test session starts ==============================
2025-12-04T11:59:23.3697316Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:59:23.3697361Z cachedir: .pytest_cache
2025-12-04T11:59:23.3697527Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:59:23.3697576Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T11:59:23.3697617Z configfile: pytest.ini
2025-12-04T11:59:23.3697787Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:59:23.3697860Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T11:59:23.3698137Z stepcurrent: skipping 2 already run items. Running only test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda
2025-12-04T11:59:23.3698183Z Running 1 items in this shard
2025-12-04T11:59:23.3698185Z 
2025-12-04T11:59:23.3698568Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda I1204 11:56:09.846000 208581 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 208650
2025-12-04T11:59:23.3698727Z I1204 11:56:09.847000 208581 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 208651
2025-12-04T11:59:23.3698882Z I1204 11:56:09.847000 208581 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 208652
2025-12-04T11:59:23.3699032Z I1204 11:56:09.848000 208581 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 208653
2025-12-04T11:59:23.3699539Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3699650Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3700148Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3700213Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3700728Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3700792Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3701285Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3701342Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3701490Z [rank2]:E1204 11:56:16.875000 208652 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3701655Z [rank2]:E1204 11:56:16.875000 208652 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3701955Z [rank2]:E1204 11:56:16.875000 208652 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3702114Z [rank2]:E1204 11:56:16.875000 208652 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3702409Z [rank2]:E1204 11:56:16.875000 208652 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3702539Z [rank2]:E1204 11:56:16.875000 208652 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3702820Z [rank2]:E1204 11:56:16.875000 208652 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3702970Z [rank2]:E1204 11:56:16.875000 208652 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3703273Z [rank2]:E1204 11:56:16.875000 208652 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3703423Z [rank2]:E1204 11:56:16.875000 208652 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3703701Z [rank2]:E1204 11:56:16.875000 208652 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3703842Z [rank2]:E1204 11:56:16.875000 208652 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3704126Z [rank2]:E1204 11:56:16.875000 208652 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3704274Z [rank2]:E1204 11:56:16.875000 208652 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3704801Z [rank2]:E1204 11:56:16.875000 208652 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 2300575744 and is now 3504340992.
2025-12-04T11:59:23.3704939Z [rank2]:E1204 11:56:16.875000 208652 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3705138Z [rank2]:E1204 11:56:16.875000 208652 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3705547Z [rank2]:E1204 11:56:16.875000 208652 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda
2025-12-04T11:59:23.3705664Z [rank2]:E1204 11:56:16.875000 208652 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3705883Z [rank2]:E1204 11:56:16.875000 208652 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3706050Z [rank2]:E1204 11:56:16.875000 208652 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T11:59:23.3706090Z dist init r=2, world=4
2025-12-04T11:59:23.3706232Z [rank1]:E1204 11:56:16.877000 208651 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3706393Z [rank1]:E1204 11:56:16.877000 208651 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3706683Z [rank1]:E1204 11:56:16.877000 208651 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3706838Z [rank1]:E1204 11:56:16.877000 208651 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3707130Z [rank1]:E1204 11:56:16.877000 208651 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3707254Z [rank1]:E1204 11:56:16.877000 208651 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3707587Z [rank1]:E1204 11:56:16.877000 208651 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3707735Z [rank1]:E1204 11:56:16.877000 208651 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3708014Z [rank1]:E1204 11:56:16.877000 208651 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3708165Z [rank1]:E1204 11:56:16.877000 208651 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3708444Z [rank1]:E1204 11:56:16.877000 208651 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3708582Z [rank1]:E1204 11:56:16.877000 208651 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3708861Z [rank1]:E1204 11:56:16.877000 208651 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3709011Z [rank1]:E1204 11:56:16.877000 208651 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3709553Z [rank1]:E1204 11:56:16.877000 208651 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 2317352960 and is now 3521118208.
2025-12-04T11:59:23.3709710Z [rank1]:E1204 11:56:16.877000 208651 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3709908Z [rank1]:E1204 11:56:16.877000 208651 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3710315Z [rank1]:E1204 11:56:16.877000 208651 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda
2025-12-04T11:59:23.3710431Z [rank1]:E1204 11:56:16.877000 208651 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3710644Z [rank1]:E1204 11:56:16.877000 208651 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3710810Z [rank1]:E1204 11:56:16.877000 208651 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T11:59:23.3710849Z dist init r=1, world=4
2025-12-04T11:59:23.3710989Z [rank3]:E1204 11:56:16.880000 208653 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3711149Z [rank3]:E1204 11:56:16.880000 208653 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3711439Z [rank3]:E1204 11:56:16.880000 208653 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3711593Z [rank3]:E1204 11:56:16.880000 208653 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3711905Z [rank3]:E1204 11:56:16.880000 208653 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3712027Z [rank3]:E1204 11:56:16.880000 208653 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3712304Z [rank3]:E1204 11:56:16.880000 208653 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3712455Z [rank3]:E1204 11:56:16.880000 208653 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3712733Z [rank3]:E1204 11:56:16.880000 208653 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3712882Z [rank3]:E1204 11:56:16.880000 208653 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3713162Z [rank3]:E1204 11:56:16.880000 208653 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3713297Z [rank3]:E1204 11:56:16.880000 208653 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3713619Z [rank3]:E1204 11:56:16.880000 208653 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3713767Z [rank3]:E1204 11:56:16.880000 208653 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3714290Z [rank3]:E1204 11:56:16.880000 208653 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 2250244096 and is now 3454009344.
2025-12-04T11:59:23.3714407Z [rank3]:E1204 11:56:16.880000 208653 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3714604Z [rank3]:E1204 11:56:16.880000 208653 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3715013Z [rank3]:E1204 11:56:16.880000 208653 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda
2025-12-04T11:59:23.3715126Z [rank3]:E1204 11:56:16.880000 208653 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3715338Z [rank3]:E1204 11:56:16.880000 208653 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3715502Z [rank3]:E1204 11:56:16.880000 208653 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T11:59:23.3715542Z dist init r=3, world=4
2025-12-04T11:59:23.3715677Z [rank0]:E1204 11:56:16.924000 208650 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3715837Z [rank0]:E1204 11:56:16.924000 208650 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3716144Z [rank0]:E1204 11:56:16.924000 208650 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3716298Z [rank0]:E1204 11:56:16.924000 208650 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3716582Z [rank0]:E1204 11:56:16.924000 208650 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3716706Z [rank0]:E1204 11:56:16.924000 208650 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3716984Z [rank0]:E1204 11:56:16.924000 208650 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3717132Z [rank0]:E1204 11:56:16.924000 208650 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3717410Z [rank0]:E1204 11:56:16.924000 208650 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3717557Z [rank0]:E1204 11:56:16.924000 208650 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3717850Z [rank0]:E1204 11:56:16.924000 208650 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3717987Z [rank0]:E1204 11:56:16.924000 208650 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3718265Z [rank0]:E1204 11:56:16.924000 208650 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3718415Z [rank0]:E1204 11:56:16.924000 208650 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3718927Z [rank0]:E1204 11:56:16.924000 208650 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3072 on device 0. CUDA driver allocated memory was 2459959296 and is now 3663724544.
2025-12-04T11:59:23.3719044Z [rank0]:E1204 11:56:16.924000 208650 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3719244Z [rank0]:E1204 11:56:16.924000 208650 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3719719Z [rank0]:E1204 11:56:16.924000 208650 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda
2025-12-04T11:59:23.3719834Z [rank0]:E1204 11:56:16.924000 208650 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3720045Z [rank0]:E1204 11:56:16.924000 208650 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3720211Z [rank0]:E1204 11:56:16.924000 208650 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T11:59:23.3720250Z dist init r=0, world=4
2025-12-04T11:59:23.3720617Z [rank0]:[W1204 11:56:17.210056491 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T11:59:23.3720659Z FAILED [9.0183s] [100%]
2025-12-04T11:59:23.3720662Z 
2025-12-04T11:59:23.3720719Z =================================== FAILURES ===================================
2025-12-04T11:59:23.3720860Z _ TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda _
2025-12-04T11:59:23.3720906Z Traceback (most recent call last):
2025-12-04T11:59:23.3721073Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T11:59:23.3721116Z     self._join_processes(fn)
2025-12-04T11:59:23.3721294Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T11:59:23.3721347Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T11:59:23.3721531Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T11:59:23.3721575Z     raise RuntimeError(error)
2025-12-04T11:59:23.3721657Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T11:59:23.3721702Z Traceback (most recent call last):
2025-12-04T11:59:23.3721894Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3721937Z     getattr(self, test_name)()
2025-12-04T11:59:23.3722099Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3722134Z     fn()
2025-12-04T11:59:23.3722291Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3722332Z     method(*args, **kwargs)
2025-12-04T11:59:23.3722485Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3722525Z     method(*args, **kwargs)
2025-12-04T11:59:23.3722679Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3722718Z     with policy():
2025-12-04T11:59:23.3722873Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3722916Z     raise RuntimeError(msg)
2025-12-04T11:59:23.3723307Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 2317352960 and is now 3521118208.
2025-12-04T11:59:23.3723310Z 
2025-12-04T11:59:23.3723388Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3723670Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda
2025-12-04T11:59:23.3723672Z 
2025-12-04T11:59:23.3723763Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3723767Z 
2025-12-04T11:59:23.3723826Z Process 2 exited with error code 10 and exception:
2025-12-04T11:59:23.3723873Z Traceback (most recent call last):
2025-12-04T11:59:23.3724039Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3724082Z     getattr(self, test_name)()
2025-12-04T11:59:23.3724263Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3724300Z     fn()
2025-12-04T11:59:23.3724453Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3724492Z     method(*args, **kwargs)
2025-12-04T11:59:23.3724645Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3724686Z     method(*args, **kwargs)
2025-12-04T11:59:23.3724840Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3724877Z     with policy():
2025-12-04T11:59:23.3725032Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3725073Z     raise RuntimeError(msg)
2025-12-04T11:59:23.3725465Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 2300575744 and is now 3504340992.
2025-12-04T11:59:23.3725468Z 
2025-12-04T11:59:23.3725542Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3725827Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda
2025-12-04T11:59:23.3725850Z 
2025-12-04T11:59:23.3725936Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3725939Z 
2025-12-04T11:59:23.3725998Z Process 3 exited with error code 10 and exception:
2025-12-04T11:59:23.3726043Z Traceback (most recent call last):
2025-12-04T11:59:23.3726209Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3726251Z     getattr(self, test_name)()
2025-12-04T11:59:23.3726414Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3726448Z     fn()
2025-12-04T11:59:23.3726597Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3726639Z     method(*args, **kwargs)
2025-12-04T11:59:23.3726789Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3726827Z     method(*args, **kwargs)
2025-12-04T11:59:23.3726979Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3727015Z     with policy():
2025-12-04T11:59:23.3727169Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3727211Z     raise RuntimeError(msg)
2025-12-04T11:59:23.3727598Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 2250244096 and is now 3454009344.
2025-12-04T11:59:23.3727602Z 
2025-12-04T11:59:23.3727676Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3727958Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda
2025-12-04T11:59:23.3727961Z 
2025-12-04T11:59:23.3728046Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3728069Z 
2025-12-04T11:59:23.3728071Z 
2025-12-04T11:59:23.3728149Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:59:23.3728237Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T11:59:23.3728493Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-d5eb9510b4a5dc42.xml -
2025-12-04T11:59:23.3728556Z =========================== short test summary info ============================
2025-12-04T11:59:23.3728853Z FAILED [9.0183s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T11:59:23.3728898Z Traceback (most recent call last):
2025-12-04T11:59:23.3729066Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3729107Z     getattr(self, test_name)()
2025-12-04T11:59:23.3729268Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3729302Z     fn()
2025-12-04T11:59:23.3729456Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3729496Z     method(*args, **kwargs)
2025-12-04T11:59:23.3729721Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3729761Z     method(*args, **kwargs)
2025-12-04T11:59:23.3729915Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3729954Z     with policy():
2025-12-04T11:59:23.3730111Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3730152Z     raise RuntimeError(msg)
2025-12-04T11:59:23.3730539Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 2317352960 and is now 3521118208.
2025-12-04T11:59:23.3730542Z 
2025-12-04T11:59:23.3730615Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3730895Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda
2025-12-04T11:59:23.3730897Z 
2025-12-04T11:59:23.3730985Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3730987Z 
2025-12-04T11:59:23.3731046Z Process 2 exited with error code 10 and exception:
2025-12-04T11:59:23.3731092Z Traceback (most recent call last):
2025-12-04T11:59:23.3731255Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3731297Z     getattr(self, test_name)()
2025-12-04T11:59:23.3731460Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3731494Z     fn()
2025-12-04T11:59:23.3731647Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3731687Z     method(*args, **kwargs)
2025-12-04T11:59:23.3731836Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3731874Z     method(*args, **kwargs)
2025-12-04T11:59:23.3732055Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3732091Z     with policy():
2025-12-04T11:59:23.3732242Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3732283Z     raise RuntimeError(msg)
2025-12-04T11:59:23.3732670Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 2300575744 and is now 3504340992.
2025-12-04T11:59:23.3732674Z 
2025-12-04T11:59:23.3732746Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3733028Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda
2025-12-04T11:59:23.3733030Z 
2025-12-04T11:59:23.3733117Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3733119Z 
2025-12-04T11:59:23.3733178Z Process 3 exited with error code 10 and exception:
2025-12-04T11:59:23.3733222Z Traceback (most recent call last):
2025-12-04T11:59:23.3733388Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3733458Z     getattr(self, test_name)()
2025-12-04T11:59:23.3733619Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3733653Z     fn()
2025-12-04T11:59:23.3733802Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3733841Z     method(*args, **kwargs)
2025-12-04T11:59:23.3733992Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3734032Z     method(*args, **kwargs)
2025-12-04T11:59:23.3734180Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3734217Z     with policy():
2025-12-04T11:59:23.3734369Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3734411Z     raise RuntimeError(msg)
2025-12-04T11:59:23.3734797Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 2250244096 and is now 3454009344.
2025-12-04T11:59:23.3734799Z 
2025-12-04T11:59:23.3734874Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3735150Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda
2025-12-04T11:59:23.3735152Z 
2025-12-04T11:59:23.3735236Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3735301Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:59:23.3735364Z ======================= 1 failed, 7 deselected in 9.03s ========================
2025-12-04T11:59:23.3735401Z Got exit code 1
2025-12-04T11:59:23.3735440Z Retrying single test...
2025-12-04T11:59:23.3735649Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-db14df267c0d86ac.xml
2025-12-04T11:59:23.3735707Z ============================= test session starts ==============================
2025-12-04T11:59:23.3735841Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:59:23.3735881Z cachedir: .pytest_cache
2025-12-04T11:59:23.3736041Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:59:23.3736087Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T11:59:23.3736126Z configfile: pytest.ini
2025-12-04T11:59:23.3736289Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:59:23.3736366Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T11:59:23.3736637Z stepcurrent: skipping 2 already run items. Running only test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda
2025-12-04T11:59:23.3736682Z Running 1 items in this shard
2025-12-04T11:59:23.3736684Z 
2025-12-04T11:59:23.3737040Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda I1204 11:56:21.270000 208983 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 209052
2025-12-04T11:59:23.3737194Z I1204 11:56:21.271000 208983 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 209053
2025-12-04T11:59:23.3737349Z I1204 11:56:21.272000 208983 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 209054
2025-12-04T11:59:23.3737519Z I1204 11:56:21.272000 208983 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 209055
2025-12-04T11:59:23.3738020Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3738082Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3738569Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3738632Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3739117Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3739175Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3739703Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3739761Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3739905Z [rank0]:E1204 11:56:28.309000 209052 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3740099Z [rank0]:E1204 11:56:28.309000 209052 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3740392Z [rank0]:E1204 11:56:28.309000 209052 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3740545Z [rank0]:E1204 11:56:28.309000 209052 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3740834Z [rank0]:E1204 11:56:28.309000 209052 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3740957Z [rank0]:E1204 11:56:28.309000 209052 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3741238Z [rank0]:E1204 11:56:28.309000 209052 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3741385Z [rank0]:E1204 11:56:28.309000 209052 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3741667Z [rank0]:E1204 11:56:28.309000 209052 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3741846Z [rank0]:E1204 11:56:28.309000 209052 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3742123Z [rank0]:E1204 11:56:28.309000 209052 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3742261Z [rank0]:E1204 11:56:28.309000 209052 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3742540Z [rank0]:E1204 11:56:28.309000 209052 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3742689Z [rank0]:E1204 11:56:28.309000 209052 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3743206Z [rank0]:E1204 11:56:28.309000 209052 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3072 on device 0. CUDA driver allocated memory was 2459959296 and is now 3663724544.
2025-12-04T11:59:23.3743324Z [rank0]:E1204 11:56:28.309000 209052 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3743521Z [rank0]:E1204 11:56:28.309000 209052 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3743928Z [rank0]:E1204 11:56:28.309000 209052 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda
2025-12-04T11:59:23.3744043Z [rank0]:E1204 11:56:28.309000 209052 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3744253Z [rank0]:E1204 11:56:28.309000 209052 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3744445Z [rank0]:E1204 11:56:28.309000 209052 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T11:59:23.3744484Z dist init r=0, world=4
2025-12-04T11:59:23.3744623Z [rank2]:E1204 11:56:28.313000 209054 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3744783Z [rank2]:E1204 11:56:28.313000 209054 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3745069Z [rank2]:E1204 11:56:28.313000 209054 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3745223Z [rank2]:E1204 11:56:28.313000 209054 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3745508Z [rank2]:E1204 11:56:28.313000 209054 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3745630Z [rank2]:E1204 11:56:28.313000 209054 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3745907Z [rank2]:E1204 11:56:28.313000 209054 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3746079Z [rank2]:E1204 11:56:28.313000 209054 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3746359Z [rank2]:E1204 11:56:28.313000 209054 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3746506Z [rank2]:E1204 11:56:28.313000 209054 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3746783Z [rank2]:E1204 11:56:28.313000 209054 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3746918Z [rank2]:E1204 11:56:28.313000 209054 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3747198Z [rank2]:E1204 11:56:28.313000 209054 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3747345Z [rank2]:E1204 11:56:28.313000 209054 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3747860Z [rank2]:E1204 11:56:28.313000 209054 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 2300575744 and is now 3504340992.
2025-12-04T11:59:23.3747973Z [rank2]:E1204 11:56:28.313000 209054 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3748168Z [rank2]:E1204 11:56:28.313000 209054 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3748581Z [rank2]:E1204 11:56:28.313000 209054 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda
2025-12-04T11:59:23.3748711Z [rank2]:E1204 11:56:28.313000 209054 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3748920Z [rank2]:E1204 11:56:28.313000 209054 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3749082Z [rank2]:E1204 11:56:28.313000 209054 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T11:59:23.3749120Z dist init r=2, world=4
2025-12-04T11:59:23.3749259Z [rank3]:E1204 11:56:28.321000 209055 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3749417Z [rank3]:E1204 11:56:28.321000 209055 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3749752Z [rank3]:E1204 11:56:28.321000 209055 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3749904Z [rank3]:E1204 11:56:28.321000 209055 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3750188Z [rank3]:E1204 11:56:28.321000 209055 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3750337Z [rank3]:E1204 11:56:28.321000 209055 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3750615Z [rank3]:E1204 11:56:28.321000 209055 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3750762Z [rank3]:E1204 11:56:28.321000 209055 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3751042Z [rank3]:E1204 11:56:28.321000 209055 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3751187Z [rank3]:E1204 11:56:28.321000 209055 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3751462Z [rank3]:E1204 11:56:28.321000 209055 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3751604Z [rank3]:E1204 11:56:28.321000 209055 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3751884Z [rank3]:E1204 11:56:28.321000 209055 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3752032Z [rank3]:E1204 11:56:28.321000 209055 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3752543Z [rank3]:E1204 11:56:28.321000 209055 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 2250244096 and is now 3454009344.
2025-12-04T11:59:23.3752658Z [rank3]:E1204 11:56:28.321000 209055 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3752853Z [rank3]:E1204 11:56:28.321000 209055 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3753286Z [rank3]:E1204 11:56:28.321000 209055 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda
2025-12-04T11:59:23.3753399Z [rank3]:E1204 11:56:28.321000 209055 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3753608Z [rank3]:E1204 11:56:28.321000 209055 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3753773Z [rank3]:E1204 11:56:28.321000 209055 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T11:59:23.3753811Z dist init r=3, world=4
2025-12-04T11:59:23.3753949Z [rank1]:E1204 11:56:28.325000 209053 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3754107Z [rank1]:E1204 11:56:28.325000 209053 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3754394Z [rank1]:E1204 11:56:28.325000 209053 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3754547Z [rank1]:E1204 11:56:28.325000 209053 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3754850Z [rank1]:E1204 11:56:28.325000 209053 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3754971Z [rank1]:E1204 11:56:28.325000 209053 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3755249Z [rank1]:E1204 11:56:28.325000 209053 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3755398Z [rank1]:E1204 11:56:28.325000 209053 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3755674Z [rank1]:E1204 11:56:28.325000 209053 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3755824Z [rank1]:E1204 11:56:28.325000 209053 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3756102Z [rank1]:E1204 11:56:28.325000 209053 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3756238Z [rank1]:E1204 11:56:28.325000 209053 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3756517Z [rank1]:E1204 11:56:28.325000 209053 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3756664Z [rank1]:E1204 11:56:28.325000 209053 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3757203Z [rank1]:E1204 11:56:28.325000 209053 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 2317352960 and is now 3521118208.
2025-12-04T11:59:23.3757317Z [rank1]:E1204 11:56:28.325000 209053 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3757513Z [rank1]:E1204 11:56:28.325000 209053 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3757921Z [rank1]:E1204 11:56:28.325000 209053 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda
2025-12-04T11:59:23.3758034Z [rank1]:E1204 11:56:28.325000 209053 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3758245Z [rank1]:E1204 11:56:28.325000 209053 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3758408Z [rank1]:E1204 11:56:28.325000 209053 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T11:59:23.3758448Z dist init r=1, world=4
2025-12-04T11:59:23.3758785Z [rank0]:[W1204 11:56:28.470384681 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T11:59:23.3758845Z FAILED [8.8188s] [100%]
2025-12-04T11:59:23.3758847Z 
2025-12-04T11:59:23.3758903Z =================================== FAILURES ===================================
2025-12-04T11:59:23.3759041Z _ TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda _
2025-12-04T11:59:23.3759086Z Traceback (most recent call last):
2025-12-04T11:59:23.3759251Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T11:59:23.3759295Z     self._join_processes(fn)
2025-12-04T11:59:23.3759471Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T11:59:23.3759524Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T11:59:23.3759744Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T11:59:23.3759790Z     raise RuntimeError(error)
2025-12-04T11:59:23.3759869Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T11:59:23.3759912Z Traceback (most recent call last):
2025-12-04T11:59:23.3760072Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3760114Z     getattr(self, test_name)()
2025-12-04T11:59:23.3760274Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3760308Z     fn()
2025-12-04T11:59:23.3760459Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3760499Z     method(*args, **kwargs)
2025-12-04T11:59:23.3760648Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3760689Z     method(*args, **kwargs)
2025-12-04T11:59:23.3760839Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3760875Z     with policy():
2025-12-04T11:59:23.3761027Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3761068Z     raise RuntimeError(msg)
2025-12-04T11:59:23.3761491Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3072 on device 0. CUDA driver allocated memory was 2459959296 and is now 3663724544.
2025-12-04T11:59:23.3761493Z 
2025-12-04T11:59:23.3761567Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3761845Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda
2025-12-04T11:59:23.3761848Z 
2025-12-04T11:59:23.3761934Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3761936Z 
2025-12-04T11:59:23.3761938Z 
2025-12-04T11:59:23.3762014Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:59:23.3762104Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T11:59:23.3762355Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-db14df267c0d86ac.xml -
2025-12-04T11:59:23.3762416Z =========================== short test summary info ============================
2025-12-04T11:59:23.3762706Z FAILED [8.8188s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T11:59:23.3762779Z Traceback (most recent call last):
2025-12-04T11:59:23.3762946Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3762988Z     getattr(self, test_name)()
2025-12-04T11:59:23.3763149Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3763183Z     fn()
2025-12-04T11:59:23.3763335Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3763373Z     method(*args, **kwargs)
2025-12-04T11:59:23.3763523Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3763564Z     method(*args, **kwargs)
2025-12-04T11:59:23.3763716Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3763752Z     with policy():
2025-12-04T11:59:23.3763907Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3763948Z     raise RuntimeError(msg)
2025-12-04T11:59:23.3764345Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3072 on device 0. CUDA driver allocated memory was 2459959296 and is now 3663724544.
2025-12-04T11:59:23.3764348Z 
2025-12-04T11:59:23.3764420Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3764700Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda
2025-12-04T11:59:23.3764704Z 
2025-12-04T11:59:23.3764789Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3764852Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:59:23.3764913Z ======================= 1 failed, 7 deselected in 8.83s ========================
2025-12-04T11:59:23.3764974Z Got exit code 1
2025-12-04T11:59:23.3765201Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda
2025-12-04T11:59:23.3765331Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T11:59:23.3765538Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-ea93d0d366cc9967.xml
2025-12-04T11:59:23.3765598Z ============================= test session starts ==============================
2025-12-04T11:59:23.3765710Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:59:23.3765751Z cachedir: .pytest_cache
2025-12-04T11:59:23.3765909Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:59:23.3765956Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T11:59:23.3765995Z configfile: pytest.ini
2025-12-04T11:59:23.3766160Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:59:23.3766232Z collecting ... collected 8 items / 3 deselected / 5 selected
2025-12-04T11:59:23.3766283Z stepcurrent: skipping 3 already run items.
2025-12-04T11:59:23.3766325Z Running 5 items in this shard
2025-12-04T11:59:23.3766327Z 
2025-12-04T11:59:23.3766703Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda I1204 11:56:32.688000 209385 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 209454
2025-12-04T11:59:23.3766858Z I1204 11:56:32.689000 209385 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 209455
2025-12-04T11:59:23.3767012Z I1204 11:56:32.689000 209385 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 209456
2025-12-04T11:59:23.3767164Z I1204 11:56:32.690000 209385 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 209457
2025-12-04T11:59:23.3767657Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3767722Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3768211Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3768270Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3768758Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3768816Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3769324Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3769381Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3769522Z [rank0]:E1204 11:56:39.808000 209454 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3769728Z [rank0]:E1204 11:56:39.808000 209454 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3770023Z [rank0]:E1204 11:56:39.808000 209454 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3770178Z [rank0]:E1204 11:56:39.808000 209454 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3770469Z [rank0]:E1204 11:56:39.808000 209454 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3770593Z [rank0]:E1204 11:56:39.808000 209454 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3770872Z [rank0]:E1204 11:56:39.808000 209454 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3771045Z [rank0]:E1204 11:56:39.808000 209454 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3771326Z [rank0]:E1204 11:56:39.808000 209454 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3771473Z [rank0]:E1204 11:56:39.808000 209454 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3771749Z [rank0]:E1204 11:56:39.808000 209454 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3771884Z [rank0]:E1204 11:56:39.808000 209454 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3772164Z [rank0]:E1204 11:56:39.808000 209454 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3772312Z [rank0]:E1204 11:56:39.808000 209454 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3772826Z [rank0]:E1204 11:56:39.808000 209454 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3072 on device 0. CUDA driver allocated memory was 2459959296 and is now 3663724544.
2025-12-04T11:59:23.3772940Z [rank0]:E1204 11:56:39.808000 209454 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3773137Z [rank0]:E1204 11:56:39.808000 209454 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3773569Z [rank0]:E1204 11:56:39.808000 209454 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda
2025-12-04T11:59:23.3773682Z [rank0]:E1204 11:56:39.808000 209454 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3773894Z [rank0]:E1204 11:56:39.808000 209454 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3774058Z [rank0]:E1204 11:56:39.808000 209454 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T11:59:23.3774098Z dist init r=0, world=4
2025-12-04T11:59:23.3774237Z [rank3]:E1204 11:56:39.810000 209457 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3774395Z [rank3]:E1204 11:56:39.810000 209457 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3774684Z [rank3]:E1204 11:56:39.810000 209457 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3774836Z [rank3]:E1204 11:56:39.810000 209457 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3775121Z [rank3]:E1204 11:56:39.810000 209457 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3775268Z [rank3]:E1204 11:56:39.810000 209457 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3775545Z [rank3]:E1204 11:56:39.810000 209457 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3775695Z [rank3]:E1204 11:56:39.810000 209457 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3775973Z [rank3]:E1204 11:56:39.810000 209457 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3776119Z [rank3]:E1204 11:56:39.810000 209457 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3776398Z [rank3]:E1204 11:56:39.810000 209457 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3776535Z [rank3]:E1204 11:56:39.810000 209457 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3776815Z [rank3]:E1204 11:56:39.810000 209457 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3776963Z [rank3]:E1204 11:56:39.810000 209457 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3777482Z [rank3]:E1204 11:56:39.810000 209457 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 2250244096 and is now 3454009344.
2025-12-04T11:59:23.3777597Z [rank3]:E1204 11:56:39.810000 209457 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3777812Z [rank3]:E1204 11:56:39.810000 209457 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3778218Z [rank3]:E1204 11:56:39.810000 209457 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda
2025-12-04T11:59:23.3778332Z [rank3]:E1204 11:56:39.810000 209457 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3778541Z [rank3]:E1204 11:56:39.810000 209457 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3778706Z [rank3]:E1204 11:56:39.810000 209457 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T11:59:23.3778746Z dist init r=3, world=4
2025-12-04T11:59:23.3778883Z [rank2]:E1204 11:56:39.850000 209456 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3779042Z [rank2]:E1204 11:56:39.850000 209456 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3779329Z [rank2]:E1204 11:56:39.850000 209456 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3779500Z [rank2]:E1204 11:56:39.850000 209456 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3779814Z [rank2]:E1204 11:56:39.850000 209456 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3779938Z [rank2]:E1204 11:56:39.850000 209456 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3780214Z [rank2]:E1204 11:56:39.850000 209456 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3780361Z [rank2]:E1204 11:56:39.850000 209456 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3780641Z [rank2]:E1204 11:56:39.850000 209456 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3780786Z [rank2]:E1204 11:56:39.850000 209456 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3781064Z [rank2]:E1204 11:56:39.850000 209456 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3781200Z [rank2]:E1204 11:56:39.850000 209456 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3781480Z [rank2]:E1204 11:56:39.850000 209456 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3781632Z [rank2]:E1204 11:56:39.850000 209456 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3782175Z [rank2]:E1204 11:56:39.850000 209456 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 2300575744 and is now 3504340992.
2025-12-04T11:59:23.3782288Z [rank2]:E1204 11:56:39.850000 209456 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3782482Z [rank2]:E1204 11:56:39.850000 209456 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3782890Z [rank2]:E1204 11:56:39.850000 209456 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda
2025-12-04T11:59:23.3783003Z [rank2]:E1204 11:56:39.850000 209456 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3783215Z [rank2]:E1204 11:56:39.850000 209456 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3783381Z [rank2]:E1204 11:56:39.850000 209456 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T11:59:23.3783418Z dist init r=2, world=4
2025-12-04T11:59:23.3783558Z [rank1]:E1204 11:56:39.863000 209455 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3783740Z [rank1]:E1204 11:56:39.863000 209455 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3784029Z [rank1]:E1204 11:56:39.863000 209455 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3784184Z [rank1]:E1204 11:56:39.863000 209455 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3784471Z [rank1]:E1204 11:56:39.863000 209455 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3784595Z [rank1]:E1204 11:56:39.863000 209455 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3784875Z [rank1]:E1204 11:56:39.863000 209455 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3785024Z [rank1]:E1204 11:56:39.863000 209455 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3785302Z [rank1]:E1204 11:56:39.863000 209455 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3785450Z [rank1]:E1204 11:56:39.863000 209455 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3785725Z [rank1]:E1204 11:56:39.863000 209455 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3785865Z [rank1]:E1204 11:56:39.863000 209455 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3786141Z [rank1]:E1204 11:56:39.863000 209455 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3786310Z [rank1]:E1204 11:56:39.863000 209455 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3786823Z [rank1]:E1204 11:56:39.863000 209455 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 2317352960 and is now 3521118208.
2025-12-04T11:59:23.3786938Z [rank1]:E1204 11:56:39.863000 209455 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3787137Z [rank1]:E1204 11:56:39.863000 209455 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3787543Z [rank1]:E1204 11:56:39.863000 209455 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda
2025-12-04T11:59:23.3787657Z [rank1]:E1204 11:56:39.863000 209455 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3787867Z [rank1]:E1204 11:56:39.863000 209455 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3788053Z [rank1]:E1204 11:56:39.863000 209455 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T11:59:23.3788094Z dist init r=1, world=4
2025-12-04T11:59:23.3788433Z [rank0]:[W1204 11:56:39.974212773 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T11:59:23.3788475Z FAILED [8.9210s] [ 20%]
2025-12-04T11:59:23.3788476Z 
2025-12-04T11:59:23.3788534Z =================================== FAILURES ===================================
2025-12-04T11:59:23.3788671Z _ TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda _
2025-12-04T11:59:23.3788717Z Traceback (most recent call last):
2025-12-04T11:59:23.3788886Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T11:59:23.3788932Z     self._join_processes(fn)
2025-12-04T11:59:23.3789110Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T11:59:23.3789163Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T11:59:23.3789346Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T11:59:23.3789390Z     raise RuntimeError(error)
2025-12-04T11:59:23.3789473Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T11:59:23.3789517Z Traceback (most recent call last):
2025-12-04T11:59:23.3789712Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3789755Z     getattr(self, test_name)()
2025-12-04T11:59:23.3789916Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3789952Z     fn()
2025-12-04T11:59:23.3790106Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3790145Z     method(*args, **kwargs)
2025-12-04T11:59:23.3790300Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3790368Z     method(*args, **kwargs)
2025-12-04T11:59:23.3790521Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3790557Z     with policy():
2025-12-04T11:59:23.3790712Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3790755Z     raise RuntimeError(msg)
2025-12-04T11:59:23.3791140Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3072 on device 0. CUDA driver allocated memory was 2459959296 and is now 3663724544.
2025-12-04T11:59:23.3791144Z 
2025-12-04T11:59:23.3791221Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3791507Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda
2025-12-04T11:59:23.3791509Z 
2025-12-04T11:59:23.3791598Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3791600Z 
2025-12-04T11:59:23.3791602Z 
2025-12-04T11:59:23.3791680Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:59:23.3791768Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T11:59:23.3792057Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-ea93d0d366cc9967.xml -
2025-12-04T11:59:23.3792119Z =========================== short test summary info ============================
2025-12-04T11:59:23.3792417Z FAILED [8.9210s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T11:59:23.3792461Z Traceback (most recent call last):
2025-12-04T11:59:23.3792629Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3792672Z     getattr(self, test_name)()
2025-12-04T11:59:23.3792835Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3792872Z     fn()
2025-12-04T11:59:23.3793025Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3793065Z     method(*args, **kwargs)
2025-12-04T11:59:23.3793220Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3793258Z     method(*args, **kwargs)
2025-12-04T11:59:23.3793412Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3793450Z     with policy():
2025-12-04T11:59:23.3793605Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3793646Z     raise RuntimeError(msg)
2025-12-04T11:59:23.3794039Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3072 on device 0. CUDA driver allocated memory was 2459959296 and is now 3663724544.
2025-12-04T11:59:23.3794043Z 
2025-12-04T11:59:23.3794120Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3794425Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda
2025-12-04T11:59:23.3794427Z 
2025-12-04T11:59:23.3794516Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3794580Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:59:23.3794644Z ======================= 1 failed, 3 deselected in 8.93s ========================
2025-12-04T11:59:23.3794680Z Got exit code 1
2025-12-04T11:59:23.3794725Z Retrying single test...
2025-12-04T11:59:23.3794934Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-4cc42fc7936fe35d.xml
2025-12-04T11:59:23.3794995Z ============================= test session starts ==============================
2025-12-04T11:59:23.3795109Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:59:23.3795151Z cachedir: .pytest_cache
2025-12-04T11:59:23.3795312Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:59:23.3795359Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T11:59:23.3795399Z configfile: pytest.ini
2025-12-04T11:59:23.3795569Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:59:23.3795642Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T11:59:23.3795935Z stepcurrent: skipping 3 already run items. Running only test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda
2025-12-04T11:59:23.3795980Z Running 1 items in this shard
2025-12-04T11:59:23.3795982Z 
2025-12-04T11:59:23.3796333Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda I1204 11:56:44.456000 209787 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 209856
2025-12-04T11:59:23.3796492Z I1204 11:56:44.456000 209787 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 209857
2025-12-04T11:59:23.3796643Z I1204 11:56:44.457000 209787 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 209858
2025-12-04T11:59:23.3796796Z I1204 11:56:44.457000 209787 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 209859
2025-12-04T11:59:23.3797295Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3797357Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3797850Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3797911Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3798399Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3798476Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3798965Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3799025Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3799167Z [rank1]:E1204 11:56:51.563000 209857 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3799331Z [rank1]:E1204 11:56:51.563000 209857 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3799662Z [rank1]:E1204 11:56:51.563000 209857 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3799821Z [rank1]:E1204 11:56:51.563000 209857 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3800104Z [rank1]:E1204 11:56:51.563000 209857 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3800258Z [rank1]:E1204 11:56:51.563000 209857 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3800538Z [rank1]:E1204 11:56:51.563000 209857 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3800688Z [rank1]:E1204 11:56:51.563000 209857 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3800965Z [rank1]:E1204 11:56:51.563000 209857 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3801114Z [rank1]:E1204 11:56:51.563000 209857 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3801394Z [rank1]:E1204 11:56:51.563000 209857 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3801531Z [rank1]:E1204 11:56:51.563000 209857 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3801817Z [rank1]:E1204 11:56:51.563000 209857 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3801966Z [rank1]:E1204 11:56:51.563000 209857 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3802488Z [rank1]:E1204 11:56:51.563000 209857 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 2317352960 and is now 3521118208.
2025-12-04T11:59:23.3802604Z [rank1]:E1204 11:56:51.563000 209857 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3802825Z [rank1]:E1204 11:56:51.563000 209857 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3803235Z [rank1]:E1204 11:56:51.563000 209857 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda
2025-12-04T11:59:23.3803351Z [rank1]:E1204 11:56:51.563000 209857 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3803564Z [rank1]:E1204 11:56:51.563000 209857 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3803730Z [rank1]:E1204 11:56:51.563000 209857 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T11:59:23.3803769Z dist init r=1, world=4
2025-12-04T11:59:23.3803908Z [rank2]:E1204 11:56:51.577000 209858 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3804065Z [rank2]:E1204 11:56:51.577000 209858 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3804353Z [rank2]:E1204 11:56:51.577000 209858 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3804526Z [rank2]:E1204 11:56:51.577000 209858 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3804812Z [rank2]:E1204 11:56:51.577000 209858 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3804937Z [rank2]:E1204 11:56:51.577000 209858 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3805216Z [rank2]:E1204 11:56:51.577000 209858 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3805364Z [rank2]:E1204 11:56:51.577000 209858 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3805643Z [rank2]:E1204 11:56:51.577000 209858 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3805793Z [rank2]:E1204 11:56:51.577000 209858 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3806071Z [rank2]:E1204 11:56:51.577000 209858 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3806210Z [rank2]:E1204 11:56:51.577000 209858 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3806490Z [rank2]:E1204 11:56:51.577000 209858 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3806640Z [rank2]:E1204 11:56:51.577000 209858 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3807191Z [rank2]:E1204 11:56:51.577000 209858 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 2300575744 and is now 3504340992.
2025-12-04T11:59:23.3807304Z [rank2]:E1204 11:56:51.577000 209858 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3807501Z [rank2]:E1204 11:56:51.577000 209858 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3807905Z [rank2]:E1204 11:56:51.577000 209858 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda
2025-12-04T11:59:23.3808021Z [rank2]:E1204 11:56:51.577000 209858 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3808235Z [rank2]:E1204 11:56:51.577000 209858 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3808399Z [rank2]:E1204 11:56:51.577000 209858 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T11:59:23.3808440Z dist init r=2, world=4
2025-12-04T11:59:23.3808577Z [rank3]:E1204 11:56:51.589000 209859 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3808756Z [rank3]:E1204 11:56:51.589000 209859 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3809044Z [rank3]:E1204 11:56:51.589000 209859 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3809199Z [rank3]:E1204 11:56:51.589000 209859 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3809483Z [rank3]:E1204 11:56:51.589000 209859 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3809643Z [rank3]:E1204 11:56:51.589000 209859 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3809920Z [rank3]:E1204 11:56:51.589000 209859 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3810068Z [rank3]:E1204 11:56:51.589000 209859 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3810349Z [rank3]:E1204 11:56:51.589000 209859 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3810495Z [rank3]:E1204 11:56:51.589000 209859 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3810770Z [rank3]:E1204 11:56:51.589000 209859 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3810908Z [rank3]:E1204 11:56:51.589000 209859 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3811187Z [rank3]:E1204 11:56:51.589000 209859 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3811362Z [rank3]:E1204 11:56:51.589000 209859 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3811878Z [rank3]:E1204 11:56:51.589000 209859 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 2250244096 and is now 3454009344.
2025-12-04T11:59:23.3811994Z [rank3]:E1204 11:56:51.589000 209859 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3812187Z [rank3]:E1204 11:56:51.589000 209859 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3812591Z [rank3]:E1204 11:56:51.589000 209859 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda
2025-12-04T11:59:23.3812702Z [rank3]:E1204 11:56:51.589000 209859 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3812914Z [rank3]:E1204 11:56:51.589000 209859 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3813102Z [rank3]:E1204 11:56:51.589000 209859 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T11:59:23.3813140Z dist init r=3, world=4
2025-12-04T11:59:23.3813278Z [rank0]:E1204 11:56:51.623000 209856 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3813439Z [rank0]:E1204 11:56:51.623000 209856 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3813727Z [rank0]:E1204 11:56:51.623000 209856 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3813880Z [rank0]:E1204 11:56:51.623000 209856 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3814166Z [rank0]:E1204 11:56:51.623000 209856 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3814291Z [rank0]:E1204 11:56:51.623000 209856 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3814580Z [rank0]:E1204 11:56:51.623000 209856 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3814731Z [rank0]:E1204 11:56:51.623000 209856 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3815009Z [rank0]:E1204 11:56:51.623000 209856 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3815160Z [rank0]:E1204 11:56:51.623000 209856 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3815436Z [rank0]:E1204 11:56:51.623000 209856 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3815594Z [rank0]:E1204 11:56:51.623000 209856 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3815873Z [rank0]:E1204 11:56:51.623000 209856 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3816023Z [rank0]:E1204 11:56:51.623000 209856 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3816541Z [rank0]:E1204 11:56:51.623000 209856 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3072 on device 0. CUDA driver allocated memory was 2459959296 and is now 3663724544.
2025-12-04T11:59:23.3816656Z [rank0]:E1204 11:56:51.623000 209856 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3816855Z [rank0]:E1204 11:56:51.623000 209856 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3817258Z [rank0]:E1204 11:56:51.623000 209856 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda
2025-12-04T11:59:23.3817393Z [rank0]:E1204 11:56:51.623000 209856 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3817608Z [rank0]:E1204 11:56:51.623000 209856 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3817773Z [rank0]:E1204 11:56:51.623000 209856 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T11:59:23.3817813Z dist init r=0, world=4
2025-12-04T11:59:23.3818154Z [rank0]:[W1204 11:56:51.878778688 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T11:59:23.3818196Z FAILED [9.0198s] [100%]
2025-12-04T11:59:23.3818198Z 
2025-12-04T11:59:23.3818254Z =================================== FAILURES ===================================
2025-12-04T11:59:23.3818393Z _ TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda _
2025-12-04T11:59:23.3818438Z Traceback (most recent call last):
2025-12-04T11:59:23.3818607Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T11:59:23.3818651Z     self._join_processes(fn)
2025-12-04T11:59:23.3818829Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T11:59:23.3818882Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T11:59:23.3819063Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T11:59:23.3819106Z     raise RuntimeError(error)
2025-12-04T11:59:23.3819186Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T11:59:23.3819231Z Traceback (most recent call last):
2025-12-04T11:59:23.3819394Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3819437Z     getattr(self, test_name)()
2025-12-04T11:59:23.3819629Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3819663Z     fn()
2025-12-04T11:59:23.3819840Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3819882Z     method(*args, **kwargs)
2025-12-04T11:59:23.3820034Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3820076Z     method(*args, **kwargs)
2025-12-04T11:59:23.3820227Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3820268Z     with policy():
2025-12-04T11:59:23.3820420Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3820463Z     raise RuntimeError(msg)
2025-12-04T11:59:23.3820851Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 2317352960 and is now 3521118208.
2025-12-04T11:59:23.3820853Z 
2025-12-04T11:59:23.3820931Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3821213Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda
2025-12-04T11:59:23.3821244Z 
2025-12-04T11:59:23.3821333Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3821335Z 
2025-12-04T11:59:23.3821337Z 
2025-12-04T11:59:23.3821415Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:59:23.3821502Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T11:59:23.3821755Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-4cc42fc7936fe35d.xml -
2025-12-04T11:59:23.3821815Z =========================== short test summary info ============================
2025-12-04T11:59:23.3822107Z FAILED [9.0198s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T11:59:23.3822154Z Traceback (most recent call last):
2025-12-04T11:59:23.3822319Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3822361Z     getattr(self, test_name)()
2025-12-04T11:59:23.3822524Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3822557Z     fn()
2025-12-04T11:59:23.3822712Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3822751Z     method(*args, **kwargs)
2025-12-04T11:59:23.3822904Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3822943Z     method(*args, **kwargs)
2025-12-04T11:59:23.3823097Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3823138Z     with policy():
2025-12-04T11:59:23.3823293Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3823335Z     raise RuntimeError(msg)
2025-12-04T11:59:23.3823754Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 2317352960 and is now 3521118208.
2025-12-04T11:59:23.3823756Z 
2025-12-04T11:59:23.3823832Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3824110Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda
2025-12-04T11:59:23.3824112Z 
2025-12-04T11:59:23.3824201Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3824263Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:59:23.3824324Z ======================= 1 failed, 7 deselected in 9.03s ========================
2025-12-04T11:59:23.3824361Z Got exit code 1
2025-12-04T11:59:23.3824402Z Retrying single test...
2025-12-04T11:59:23.3824609Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-fa20f3f21d04c3a6.xml
2025-12-04T11:59:23.3824670Z ============================= test session starts ==============================
2025-12-04T11:59:23.3824784Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:59:23.3824823Z cachedir: .pytest_cache
2025-12-04T11:59:23.3824983Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:59:23.3825051Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T11:59:23.3825091Z configfile: pytest.ini
2025-12-04T11:59:23.3825255Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:59:23.3825328Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T11:59:23.3825597Z stepcurrent: skipping 3 already run items. Running only test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda
2025-12-04T11:59:23.3825640Z Running 1 items in this shard
2025-12-04T11:59:23.3825642Z 
2025-12-04T11:59:23.3825996Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda I1204 11:56:56.110000 210189 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 210258
2025-12-04T11:59:23.3826152Z I1204 11:56:56.110000 210189 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 210259
2025-12-04T11:59:23.3826304Z I1204 11:56:56.111000 210189 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 210260
2025-12-04T11:59:23.3826454Z I1204 11:56:56.111000 210189 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 210261
2025-12-04T11:59:23.3826951Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3827012Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3827502Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3827563Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3828067Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3828126Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3828606Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3828663Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3828806Z [rank1]:E1204 11:57:03.214000 210259 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3828970Z [rank1]:E1204 11:57:03.214000 210259 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3829262Z [rank1]:E1204 11:57:03.214000 210259 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3829437Z [rank1]:E1204 11:57:03.214000 210259 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3829766Z [rank1]:E1204 11:57:03.214000 210259 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3829891Z [rank1]:E1204 11:57:03.214000 210259 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3830169Z [rank1]:E1204 11:57:03.214000 210259 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3830317Z [rank1]:E1204 11:57:03.214000 210259 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3830596Z [rank1]:E1204 11:57:03.214000 210259 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3830743Z [rank1]:E1204 11:57:03.214000 210259 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3831020Z [rank1]:E1204 11:57:03.214000 210259 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3831157Z [rank1]:E1204 11:57:03.214000 210259 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3831435Z [rank1]:E1204 11:57:03.214000 210259 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3831586Z [rank1]:E1204 11:57:03.214000 210259 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3832125Z [rank1]:E1204 11:57:03.214000 210259 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 2317352960 and is now 3521118208.
2025-12-04T11:59:23.3832243Z [rank1]:E1204 11:57:03.214000 210259 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3832438Z [rank1]:E1204 11:57:03.214000 210259 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3832843Z [rank1]:E1204 11:57:03.214000 210259 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda
2025-12-04T11:59:23.3832961Z [rank1]:E1204 11:57:03.214000 210259 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3833173Z [rank1]:E1204 11:57:03.214000 210259 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3833338Z [rank1]:E1204 11:57:03.214000 210259 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T11:59:23.3833377Z dist init r=1, world=4
2025-12-04T11:59:23.3833516Z [rank2]:E1204 11:57:03.215000 210260 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3833700Z [rank2]:E1204 11:57:03.215000 210260 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3833989Z [rank2]:E1204 11:57:03.215000 210260 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3834144Z [rank2]:E1204 11:57:03.215000 210260 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3834431Z [rank2]:E1204 11:57:03.215000 210260 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3834554Z [rank2]:E1204 11:57:03.215000 210260 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3834832Z [rank2]:E1204 11:57:03.215000 210260 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3834985Z [rank2]:E1204 11:57:03.215000 210260 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3835267Z [rank2]:E1204 11:57:03.215000 210260 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3835413Z [rank2]:E1204 11:57:03.215000 210260 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3835689Z [rank2]:E1204 11:57:03.215000 210260 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3835826Z [rank2]:E1204 11:57:03.215000 210260 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3836106Z [rank2]:E1204 11:57:03.215000 210260 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3836273Z [rank2]:E1204 11:57:03.215000 210260 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3836787Z [rank2]:E1204 11:57:03.215000 210260 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 2300575744 and is now 3504340992.
2025-12-04T11:59:23.3836901Z [rank2]:E1204 11:57:03.215000 210260 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3837096Z [rank2]:E1204 11:57:03.215000 210260 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3837505Z [rank2]:E1204 11:57:03.215000 210260 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda
2025-12-04T11:59:23.3837618Z [rank2]:E1204 11:57:03.215000 210260 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3837829Z [rank2]:E1204 11:57:03.215000 210260 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3838019Z [rank2]:E1204 11:57:03.215000 210260 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T11:59:23.3838059Z dist init r=2, world=4
2025-12-04T11:59:23.3838195Z [rank0]:E1204 11:57:03.226000 210258 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3838355Z [rank0]:E1204 11:57:03.226000 210258 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3838643Z [rank0]:E1204 11:57:03.226000 210258 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3838798Z [rank0]:E1204 11:57:03.226000 210258 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3839084Z [rank0]:E1204 11:57:03.226000 210258 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3839208Z [rank0]:E1204 11:57:03.226000 210258 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3839491Z [rank0]:E1204 11:57:03.226000 210258 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3839676Z [rank0]:E1204 11:57:03.226000 210258 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3839955Z [rank0]:E1204 11:57:03.226000 210258 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3840105Z [rank0]:E1204 11:57:03.226000 210258 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3840388Z [rank0]:E1204 11:57:03.226000 210258 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3840550Z [rank0]:E1204 11:57:03.226000 210258 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3840831Z [rank0]:E1204 11:57:03.226000 210258 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3840982Z [rank0]:E1204 11:57:03.226000 210258 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3841494Z [rank0]:E1204 11:57:03.226000 210258 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3072 on device 0. CUDA driver allocated memory was 2459959296 and is now 3663724544.
2025-12-04T11:59:23.3841611Z [rank0]:E1204 11:57:03.226000 210258 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3841807Z [rank0]:E1204 11:57:03.226000 210258 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3842213Z [rank0]:E1204 11:57:03.226000 210258 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda
2025-12-04T11:59:23.3842353Z [rank0]:E1204 11:57:03.226000 210258 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3842564Z [rank0]:E1204 11:57:03.226000 210258 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3842732Z [rank0]:E1204 11:57:03.226000 210258 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T11:59:23.3842770Z dist init r=0, world=4
2025-12-04T11:59:23.3842908Z [rank3]:E1204 11:57:03.274000 210261 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3843068Z [rank3]:E1204 11:57:03.274000 210261 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3843356Z [rank3]:E1204 11:57:03.274000 210261 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3843510Z [rank3]:E1204 11:57:03.274000 210261 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3843799Z [rank3]:E1204 11:57:03.274000 210261 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3843925Z [rank3]:E1204 11:57:03.274000 210261 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3844204Z [rank3]:E1204 11:57:03.274000 210261 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3844354Z [rank3]:E1204 11:57:03.274000 210261 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3844631Z [rank3]:E1204 11:57:03.274000 210261 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3844780Z [rank3]:E1204 11:57:03.274000 210261 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3845077Z [rank3]:E1204 11:57:03.274000 210261 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3845216Z [rank3]:E1204 11:57:03.274000 210261 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3845501Z [rank3]:E1204 11:57:03.274000 210261 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3845651Z [rank3]:E1204 11:57:03.274000 210261 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3846172Z [rank3]:E1204 11:57:03.274000 210261 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 2250244096 and is now 3454009344.
2025-12-04T11:59:23.3846283Z [rank3]:E1204 11:57:03.274000 210261 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3846480Z [rank3]:E1204 11:57:03.274000 210261 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3846901Z [rank3]:E1204 11:57:03.274000 210261 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda
2025-12-04T11:59:23.3847016Z [rank3]:E1204 11:57:03.274000 210261 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3847229Z [rank3]:E1204 11:57:03.274000 210261 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3847394Z [rank3]:E1204 11:57:03.274000 210261 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T11:59:23.3847434Z dist init r=3, world=4
2025-12-04T11:59:23.3847775Z [rank0]:[W1204 11:57:03.394562699 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T11:59:23.3847817Z FAILED [9.0182s] [100%]
2025-12-04T11:59:23.3847819Z 
2025-12-04T11:59:23.3847876Z =================================== FAILURES ===================================
2025-12-04T11:59:23.3848015Z _ TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda _
2025-12-04T11:59:23.3848060Z Traceback (most recent call last):
2025-12-04T11:59:23.3848225Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T11:59:23.3848268Z     self._join_processes(fn)
2025-12-04T11:59:23.3848442Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T11:59:23.3848496Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T11:59:23.3848676Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T11:59:23.3848722Z     raise RuntimeError(error)
2025-12-04T11:59:23.3848802Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T11:59:23.3848849Z Traceback (most recent call last):
2025-12-04T11:59:23.3849029Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3849071Z     getattr(self, test_name)()
2025-12-04T11:59:23.3849231Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3849266Z     fn()
2025-12-04T11:59:23.3849418Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3849462Z     method(*args, **kwargs)
2025-12-04T11:59:23.3849652Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3849694Z     method(*args, **kwargs)
2025-12-04T11:59:23.3849847Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3849885Z     with policy():
2025-12-04T11:59:23.3850041Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3850085Z     raise RuntimeError(msg)
2025-12-04T11:59:23.3850470Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 2317352960 and is now 3521118208.
2025-12-04T11:59:23.3850500Z 
2025-12-04T11:59:23.3850580Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3850860Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda
2025-12-04T11:59:23.3850862Z 
2025-12-04T11:59:23.3850948Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3850952Z 
2025-12-04T11:59:23.3850954Z 
2025-12-04T11:59:23.3851031Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:59:23.3851118Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T11:59:23.3851371Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-fa20f3f21d04c3a6.xml -
2025-12-04T11:59:23.3851434Z =========================== short test summary info ============================
2025-12-04T11:59:23.3851730Z FAILED [9.0182s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T11:59:23.3851774Z Traceback (most recent call last):
2025-12-04T11:59:23.3851942Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3851984Z     getattr(self, test_name)()
2025-12-04T11:59:23.3852147Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3852184Z     fn()
2025-12-04T11:59:23.3852337Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3852377Z     method(*args, **kwargs)
2025-12-04T11:59:23.3852530Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3852572Z     method(*args, **kwargs)
2025-12-04T11:59:23.3852723Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3852761Z     with policy():
2025-12-04T11:59:23.3852940Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3852983Z     raise RuntimeError(msg)
2025-12-04T11:59:23.3853368Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 2317352960 and is now 3521118208.
2025-12-04T11:59:23.3853370Z 
2025-12-04T11:59:23.3853449Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3853729Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda
2025-12-04T11:59:23.3853731Z 
2025-12-04T11:59:23.3853821Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3853887Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:59:23.3853949Z ======================= 1 failed, 7 deselected in 9.03s ========================
2025-12-04T11:59:23.3853989Z Got exit code 1
2025-12-04T11:59:23.3854215Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda
2025-12-04T11:59:23.3854346Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T11:59:23.3854580Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-b3741d276df85ab4.xml
2025-12-04T11:59:23.3854640Z ============================= test session starts ==============================
2025-12-04T11:59:23.3854751Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:59:23.3854793Z cachedir: .pytest_cache
2025-12-04T11:59:23.3854951Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:59:23.3855000Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T11:59:23.3855039Z configfile: pytest.ini
2025-12-04T11:59:23.3855206Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:59:23.3855276Z collecting ... collected 8 items / 4 deselected / 4 selected
2025-12-04T11:59:23.3855332Z stepcurrent: skipping 4 already run items.
2025-12-04T11:59:23.3855373Z Running 4 items in this shard
2025-12-04T11:59:23.3855375Z 
2025-12-04T11:59:23.3855730Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda I1204 11:57:07.665000 210591 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 210660
2025-12-04T11:59:23.3855887Z I1204 11:57:07.666000 210591 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 210661
2025-12-04T11:59:23.3856039Z I1204 11:57:07.666000 210591 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 210662
2025-12-04T11:59:23.3856190Z I1204 11:57:07.667000 210591 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 210663
2025-12-04T11:59:23.3856689Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3856755Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3857267Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3857330Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3857819Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3857878Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3858366Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3858422Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3858566Z [rank3]:E1204 11:57:14.805000 210663 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3858750Z [rank3]:E1204 11:57:14.805000 210663 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3859040Z [rank3]:E1204 11:57:14.805000 210663 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3859198Z [rank3]:E1204 11:57:14.805000 210663 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3859482Z [rank3]:E1204 11:57:14.805000 210663 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3859652Z [rank3]:E1204 11:57:14.805000 210663 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3859932Z [rank3]:E1204 11:57:14.805000 210663 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3860082Z [rank3]:E1204 11:57:14.805000 210663 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3860360Z [rank3]:E1204 11:57:14.805000 210663 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3860510Z [rank3]:E1204 11:57:14.805000 210663 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3860787Z [rank3]:E1204 11:57:14.805000 210663 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3860927Z [rank3]:E1204 11:57:14.805000 210663 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3861207Z [rank3]:E1204 11:57:14.805000 210663 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3861382Z [rank3]:E1204 11:57:14.805000 210663 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3861901Z [rank3]:E1204 11:57:14.805000 210663 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 2250244096 and is now 3454009344.
2025-12-04T11:59:23.3862017Z [rank3]:E1204 11:57:14.805000 210663 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3862216Z [rank3]:E1204 11:57:14.805000 210663 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3862625Z [rank3]:E1204 11:57:14.805000 210663 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda
2025-12-04T11:59:23.3862737Z [rank3]:E1204 11:57:14.805000 210663 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3862949Z [rank3]:E1204 11:57:14.805000 210663 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3863136Z [rank3]:E1204 11:57:14.805000 210663 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T11:59:23.3863176Z dist init r=3, world=4
2025-12-04T11:59:23.3863311Z [rank2]:E1204 11:57:14.813000 210662 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3863472Z [rank2]:E1204 11:57:14.813000 210662 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3863761Z [rank2]:E1204 11:57:14.813000 210662 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3863917Z [rank2]:E1204 11:57:14.813000 210662 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3864205Z [rank2]:E1204 11:57:14.813000 210662 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3864329Z [rank2]:E1204 11:57:14.813000 210662 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3864610Z [rank2]:E1204 11:57:14.813000 210662 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3864757Z [rank2]:E1204 11:57:14.813000 210662 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3865035Z [rank2]:E1204 11:57:14.813000 210662 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3865182Z [rank2]:E1204 11:57:14.813000 210662 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3865458Z [rank2]:E1204 11:57:14.813000 210662 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3865616Z [rank2]:E1204 11:57:14.813000 210662 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3865897Z [rank2]:E1204 11:57:14.813000 210662 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3866045Z [rank2]:E1204 11:57:14.813000 210662 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3866558Z [rank2]:E1204 11:57:14.813000 210662 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 2300575744 and is now 3504340992.
2025-12-04T11:59:23.3866675Z [rank2]:E1204 11:57:14.813000 210662 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3866870Z [rank2]:E1204 11:57:14.813000 210662 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3867272Z [rank2]:E1204 11:57:14.813000 210662 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda
2025-12-04T11:59:23.3867406Z [rank2]:E1204 11:57:14.813000 210662 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3867614Z [rank2]:E1204 11:57:14.813000 210662 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3867778Z [rank2]:E1204 11:57:14.813000 210662 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T11:59:23.3867817Z dist init r=2, world=4
2025-12-04T11:59:23.3867955Z [rank1]:E1204 11:57:14.822000 210661 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3868116Z [rank1]:E1204 11:57:14.822000 210661 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3868408Z [rank1]:E1204 11:57:14.822000 210661 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3868563Z [rank1]:E1204 11:57:14.822000 210661 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3868847Z [rank1]:E1204 11:57:14.822000 210661 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3868973Z [rank1]:E1204 11:57:14.822000 210661 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3869248Z [rank1]:E1204 11:57:14.822000 210661 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3869397Z [rank1]:E1204 11:57:14.822000 210661 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3869705Z [rank1]:E1204 11:57:14.822000 210661 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3869891Z [rank1]:E1204 11:57:14.822000 210661 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3870167Z [rank1]:E1204 11:57:14.822000 210661 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3870300Z [rank1]:E1204 11:57:14.822000 210661 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3870584Z [rank1]:E1204 11:57:14.822000 210661 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3870732Z [rank1]:E1204 11:57:14.822000 210661 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3871248Z [rank1]:E1204 11:57:14.822000 210661 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 2317352960 and is now 3521118208.
2025-12-04T11:59:23.3871362Z [rank1]:E1204 11:57:14.822000 210661 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3871581Z [rank1]:E1204 11:57:14.822000 210661 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3871983Z [rank1]:E1204 11:57:14.822000 210661 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda
2025-12-04T11:59:23.3872097Z [rank1]:E1204 11:57:14.822000 210661 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3872309Z [rank1]:E1204 11:57:14.822000 210661 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3872472Z [rank1]:E1204 11:57:14.822000 210661 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T11:59:23.3872514Z dist init r=1, world=4
2025-12-04T11:59:23.3872650Z [rank0]:E1204 11:57:14.871000 210660 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3872810Z [rank0]:E1204 11:57:14.871000 210660 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3873098Z [rank0]:E1204 11:57:14.871000 210660 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3873252Z [rank0]:E1204 11:57:14.871000 210660 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3873538Z [rank0]:E1204 11:57:14.871000 210660 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3873662Z [rank0]:E1204 11:57:14.871000 210660 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3873940Z [rank0]:E1204 11:57:14.871000 210660 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3874108Z [rank0]:E1204 11:57:14.871000 210660 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3874386Z [rank0]:E1204 11:57:14.871000 210660 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3874530Z [rank0]:E1204 11:57:14.871000 210660 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3874806Z [rank0]:E1204 11:57:14.871000 210660 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3874943Z [rank0]:E1204 11:57:14.871000 210660 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3875222Z [rank0]:E1204 11:57:14.871000 210660 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3875371Z [rank0]:E1204 11:57:14.871000 210660 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3875881Z [rank0]:E1204 11:57:14.871000 210660 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3072 on device 0. CUDA driver allocated memory was 2459959296 and is now 3663724544.
2025-12-04T11:59:23.3876018Z [rank0]:E1204 11:57:14.871000 210660 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3876215Z [rank0]:E1204 11:57:14.871000 210660 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3876622Z [rank0]:E1204 11:57:14.871000 210660 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda
2025-12-04T11:59:23.3876736Z [rank0]:E1204 11:57:14.871000 210660 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3876947Z [rank0]:E1204 11:57:14.871000 210660 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3877112Z [rank0]:E1204 11:57:14.871000 210660 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T11:59:23.3877150Z dist init r=0, world=4
2025-12-04T11:59:23.3877492Z [rank0]:[W1204 11:57:15.230124198 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T11:59:23.3877532Z FAILED [8.9190s] [ 25%]
2025-12-04T11:59:23.3877534Z 
2025-12-04T11:59:23.3877592Z =================================== FAILURES ===================================
2025-12-04T11:59:23.3877726Z _ TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda _
2025-12-04T11:59:23.3877774Z Traceback (most recent call last):
2025-12-04T11:59:23.3877937Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T11:59:23.3877979Z     self._join_processes(fn)
2025-12-04T11:59:23.3878155Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T11:59:23.3878227Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T11:59:23.3878407Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T11:59:23.3878451Z     raise RuntimeError(error)
2025-12-04T11:59:23.3878530Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T11:59:23.3878574Z Traceback (most recent call last):
2025-12-04T11:59:23.3878735Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3878779Z     getattr(self, test_name)()
2025-12-04T11:59:23.3878938Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3878972Z     fn()
2025-12-04T11:59:23.3879124Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3879166Z     method(*args, **kwargs)
2025-12-04T11:59:23.3879321Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3879361Z     method(*args, **kwargs)
2025-12-04T11:59:23.3879512Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3879548Z     with policy():
2025-12-04T11:59:23.3879739Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3879811Z     raise RuntimeError(msg)
2025-12-04T11:59:23.3880201Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 2250244096 and is now 3454009344.
2025-12-04T11:59:23.3880205Z 
2025-12-04T11:59:23.3880283Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3880561Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda
2025-12-04T11:59:23.3880563Z 
2025-12-04T11:59:23.3880652Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3880655Z 
2025-12-04T11:59:23.3880657Z 
2025-12-04T11:59:23.3880732Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:59:23.3880823Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T11:59:23.3881074Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-b3741d276df85ab4.xml -
2025-12-04T11:59:23.3881138Z =========================== short test summary info ============================
2025-12-04T11:59:23.3881430Z FAILED [8.9190s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T11:59:23.3881474Z Traceback (most recent call last):
2025-12-04T11:59:23.3881639Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3881681Z     getattr(self, test_name)()
2025-12-04T11:59:23.3881846Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3881879Z     fn()
2025-12-04T11:59:23.3882036Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3882075Z     method(*args, **kwargs)
2025-12-04T11:59:23.3882252Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3882292Z     method(*args, **kwargs)
2025-12-04T11:59:23.3882444Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3882480Z     with policy():
2025-12-04T11:59:23.3882636Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3882678Z     raise RuntimeError(msg)
2025-12-04T11:59:23.3883066Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 2250244096 and is now 3454009344.
2025-12-04T11:59:23.3883068Z 
2025-12-04T11:59:23.3883142Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3883424Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda
2025-12-04T11:59:23.3883426Z 
2025-12-04T11:59:23.3883513Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3883577Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:59:23.3883660Z ======================= 1 failed, 4 deselected in 8.93s ========================
2025-12-04T11:59:23.3883697Z Got exit code 1
2025-12-04T11:59:23.3883737Z Retrying single test...
2025-12-04T11:59:23.3883940Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-3967fa06d3850202.xml
2025-12-04T11:59:23.3884000Z ============================= test session starts ==============================
2025-12-04T11:59:23.3884112Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:59:23.3884152Z cachedir: .pytest_cache
2025-12-04T11:59:23.3884315Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:59:23.3884362Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T11:59:23.3884400Z configfile: pytest.ini
2025-12-04T11:59:23.3884566Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:59:23.3884636Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T11:59:23.3884907Z stepcurrent: skipping 4 already run items. Running only test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda
2025-12-04T11:59:23.3884949Z Running 1 items in this shard
2025-12-04T11:59:23.3884953Z 
2025-12-04T11:59:23.3885305Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda I1204 11:57:19.287000 210993 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 211062
2025-12-04T11:59:23.3885460Z I1204 11:57:19.288000 210993 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 211063
2025-12-04T11:59:23.3885613Z I1204 11:57:19.288000 210993 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 211064
2025-12-04T11:59:23.3885764Z I1204 11:57:19.289000 210993 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 211065
2025-12-04T11:59:23.3886290Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3886352Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3886840Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3886905Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3887394Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3887451Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3887938Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3888016Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3888160Z [rank0]:E1204 11:57:26.313000 211062 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3888325Z [rank0]:E1204 11:57:26.313000 211062 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3888617Z [rank0]:E1204 11:57:26.313000 211062 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3888776Z [rank0]:E1204 11:57:26.313000 211062 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3889066Z [rank0]:E1204 11:57:26.313000 211062 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3889189Z [rank0]:E1204 11:57:26.313000 211062 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3889469Z [rank0]:E1204 11:57:26.313000 211062 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3889654Z [rank0]:E1204 11:57:26.313000 211062 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3889931Z [rank0]:E1204 11:57:26.313000 211062 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3890081Z [rank0]:E1204 11:57:26.313000 211062 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3890360Z [rank0]:E1204 11:57:26.313000 211062 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3890521Z [rank0]:E1204 11:57:26.313000 211062 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3890801Z [rank0]:E1204 11:57:26.313000 211062 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3890948Z [rank0]:E1204 11:57:26.313000 211062 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3891470Z [rank0]:E1204 11:57:26.313000 211062 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3072 on device 0. CUDA driver allocated memory was 2459959296 and is now 3663724544.
2025-12-04T11:59:23.3891587Z [rank0]:E1204 11:57:26.313000 211062 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3891783Z [rank0]:E1204 11:57:26.313000 211062 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3892189Z [rank0]:E1204 11:57:26.313000 211062 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda
2025-12-04T11:59:23.3892326Z [rank0]:E1204 11:57:26.313000 211062 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3892537Z [rank0]:E1204 11:57:26.313000 211062 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3892702Z [rank0]:E1204 11:57:26.313000 211062 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T11:59:23.3892742Z dist init r=0, world=4
2025-12-04T11:59:23.3892879Z [rank1]:E1204 11:57:26.314000 211063 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3893039Z [rank1]:E1204 11:57:26.314000 211063 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3893330Z [rank1]:E1204 11:57:26.314000 211063 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3893485Z [rank1]:E1204 11:57:26.314000 211063 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3893774Z [rank1]:E1204 11:57:26.314000 211063 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3893896Z [rank1]:E1204 11:57:26.314000 211063 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3894174Z [rank1]:E1204 11:57:26.314000 211063 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3894322Z [rank1]:E1204 11:57:26.314000 211063 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3894600Z [rank1]:E1204 11:57:26.314000 211063 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3894767Z [rank1]:E1204 11:57:26.314000 211063 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3895043Z [rank1]:E1204 11:57:26.314000 211063 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3895179Z [rank1]:E1204 11:57:26.314000 211063 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3895459Z [rank1]:E1204 11:57:26.314000 211063 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3895607Z [rank1]:E1204 11:57:26.314000 211063 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3896120Z [rank1]:E1204 11:57:26.314000 211063 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 2317352960 and is now 3521118208.
2025-12-04T11:59:23.3896235Z [rank1]:E1204 11:57:26.314000 211063 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3896431Z [rank1]:E1204 11:57:26.314000 211063 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3896853Z [rank1]:E1204 11:57:26.314000 211063 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda
2025-12-04T11:59:23.3896970Z [rank1]:E1204 11:57:26.314000 211063 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3897179Z [rank1]:E1204 11:57:26.314000 211063 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3897342Z [rank1]:E1204 11:57:26.314000 211063 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T11:59:23.3897383Z dist init r=1, world=4
2025-12-04T11:59:23.3897521Z [rank3]:E1204 11:57:26.315000 211065 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3897679Z [rank3]:E1204 11:57:26.315000 211065 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3897969Z [rank3]:E1204 11:57:26.315000 211065 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3898123Z [rank3]:E1204 11:57:26.315000 211065 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3898408Z [rank3]:E1204 11:57:26.315000 211065 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3898534Z [rank3]:E1204 11:57:26.315000 211065 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3898810Z [rank3]:E1204 11:57:26.315000 211065 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3898977Z [rank3]:E1204 11:57:26.315000 211065 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3911459Z [rank3]:E1204 11:57:26.315000 211065 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3911623Z [rank3]:E1204 11:57:26.315000 211065 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3911909Z [rank3]:E1204 11:57:26.315000 211065 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3912052Z [rank3]:E1204 11:57:26.315000 211065 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3912347Z [rank3]:E1204 11:57:26.315000 211065 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3912499Z [rank3]:E1204 11:57:26.315000 211065 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3913027Z [rank3]:E1204 11:57:26.315000 211065 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 2250244096 and is now 3454009344.
2025-12-04T11:59:23.3913199Z [rank3]:E1204 11:57:26.315000 211065 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3913400Z [rank3]:E1204 11:57:26.315000 211065 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3913812Z [rank3]:E1204 11:57:26.315000 211065 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda
2025-12-04T11:59:23.3913928Z [rank3]:E1204 11:57:26.315000 211065 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3914147Z [rank3]:E1204 11:57:26.315000 211065 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3914318Z [rank3]:E1204 11:57:26.315000 211065 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T11:59:23.3914358Z dist init r=3, world=4
2025-12-04T11:59:23.3914503Z [rank2]:E1204 11:57:26.364000 211064 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3914666Z [rank2]:E1204 11:57:26.364000 211064 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3914960Z [rank2]:E1204 11:57:26.364000 211064 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3915118Z [rank2]:E1204 11:57:26.364000 211064 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3915410Z [rank2]:E1204 11:57:26.364000 211064 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3915564Z [rank2]:E1204 11:57:26.364000 211064 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3915845Z [rank2]:E1204 11:57:26.364000 211064 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3915996Z [rank2]:E1204 11:57:26.364000 211064 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3916276Z [rank2]:E1204 11:57:26.364000 211064 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3916429Z [rank2]:E1204 11:57:26.364000 211064 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3916709Z [rank2]:E1204 11:57:26.364000 211064 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3916844Z [rank2]:E1204 11:57:26.364000 211064 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3917128Z [rank2]:E1204 11:57:26.364000 211064 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3917303Z [rank2]:E1204 11:57:26.364000 211064 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3917827Z [rank2]:E1204 11:57:26.364000 211064 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 2300575744 and is now 3504340992.
2025-12-04T11:59:23.3917942Z [rank2]:E1204 11:57:26.364000 211064 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3918139Z [rank2]:E1204 11:57:26.364000 211064 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3918551Z [rank2]:E1204 11:57:26.364000 211064 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda
2025-12-04T11:59:23.3918666Z [rank2]:E1204 11:57:26.364000 211064 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3918886Z [rank2]:E1204 11:57:26.364000 211064 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3919051Z [rank2]:E1204 11:57:26.364000 211064 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T11:59:23.3919092Z dist init r=2, world=4
2025-12-04T11:59:23.3919436Z [rank0]:[W1204 11:57:26.481518972 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T11:59:23.3919483Z FAILED [8.8178s] [100%]
2025-12-04T11:59:23.3919486Z 
2025-12-04T11:59:23.3919546Z =================================== FAILURES ===================================
2025-12-04T11:59:23.3919724Z _ TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda _
2025-12-04T11:59:23.3919772Z Traceback (most recent call last):
2025-12-04T11:59:23.3919971Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T11:59:23.3920017Z     self._join_processes(fn)
2025-12-04T11:59:23.3920194Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T11:59:23.3920250Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T11:59:23.3920430Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T11:59:23.3920477Z     raise RuntimeError(error)
2025-12-04T11:59:23.3920559Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T11:59:23.3920606Z Traceback (most recent call last):
2025-12-04T11:59:23.3920769Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3920812Z     getattr(self, test_name)()
2025-12-04T11:59:23.3920975Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3921012Z     fn()
2025-12-04T11:59:23.3921168Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3921211Z     method(*args, **kwargs)
2025-12-04T11:59:23.3921364Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3921436Z     method(*args, **kwargs)
2025-12-04T11:59:23.3921590Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3921628Z     with policy():
2025-12-04T11:59:23.3921784Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3921827Z     raise RuntimeError(msg)
2025-12-04T11:59:23.3922226Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3072 on device 0. CUDA driver allocated memory was 2459959296 and is now 3663724544.
2025-12-04T11:59:23.3922229Z 
2025-12-04T11:59:23.3922308Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3922599Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda
2025-12-04T11:59:23.3922602Z 
2025-12-04T11:59:23.3922692Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3922694Z 
2025-12-04T11:59:23.3922696Z 
2025-12-04T11:59:23.3922777Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:59:23.3922870Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T11:59:23.3923131Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-3967fa06d3850202.xml -
2025-12-04T11:59:23.3923193Z =========================== short test summary info ============================
2025-12-04T11:59:23.3923494Z FAILED [8.8178s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T11:59:23.3923542Z Traceback (most recent call last):
2025-12-04T11:59:23.3923712Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3923758Z     getattr(self, test_name)()
2025-12-04T11:59:23.3923942Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3923980Z     fn()
2025-12-04T11:59:23.3924133Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3924175Z     method(*args, **kwargs)
2025-12-04T11:59:23.3924327Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3924370Z     method(*args, **kwargs)
2025-12-04T11:59:23.3924521Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3924560Z     with policy():
2025-12-04T11:59:23.3924718Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3924760Z     raise RuntimeError(msg)
2025-12-04T11:59:23.3925153Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3072 on device 0. CUDA driver allocated memory was 2459959296 and is now 3663724544.
2025-12-04T11:59:23.3925155Z 
2025-12-04T11:59:23.3925232Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3925518Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda
2025-12-04T11:59:23.3925545Z 
2025-12-04T11:59:23.3925634Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3925701Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:59:23.3925764Z ======================= 1 failed, 7 deselected in 8.83s ========================
2025-12-04T11:59:23.3925805Z Got exit code 1
2025-12-04T11:59:23.3925844Z Retrying single test...
2025-12-04T11:59:23.3926056Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-d90c84a1d19c859a.xml
2025-12-04T11:59:23.3926114Z ============================= test session starts ==============================
2025-12-04T11:59:23.3926232Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:59:23.3926274Z cachedir: .pytest_cache
2025-12-04T11:59:23.3926439Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:59:23.3926487Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T11:59:23.3926529Z configfile: pytest.ini
2025-12-04T11:59:23.3926696Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:59:23.3926773Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T11:59:23.3927046Z stepcurrent: skipping 4 already run items. Running only test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda
2025-12-04T11:59:23.3927089Z Running 1 items in this shard
2025-12-04T11:59:23.3927091Z 
2025-12-04T11:59:23.3927443Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda I1204 11:57:30.695000 211395 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 211464
2025-12-04T11:59:23.3927602Z I1204 11:57:30.696000 211395 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 211465
2025-12-04T11:59:23.3927757Z I1204 11:57:30.696000 211395 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 211466
2025-12-04T11:59:23.3927931Z I1204 11:57:30.697000 211395 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 211467
2025-12-04T11:59:23.3928441Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3928507Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3929005Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3929065Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3929565Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3929698Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3930195Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3930254Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3930399Z [rank3]:E1204 11:57:37.875000 211467 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3930566Z [rank3]:E1204 11:57:37.875000 211467 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3930861Z [rank3]:E1204 11:57:37.875000 211467 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3931020Z [rank3]:E1204 11:57:37.875000 211467 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3931312Z [rank3]:E1204 11:57:37.875000 211467 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3931439Z [rank3]:E1204 11:57:37.875000 211467 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3931722Z [rank3]:E1204 11:57:37.875000 211467 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3931874Z [rank3]:E1204 11:57:37.875000 211467 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3932155Z [rank3]:E1204 11:57:37.875000 211467 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3932328Z [rank3]:E1204 11:57:37.875000 211467 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3932610Z [rank3]:E1204 11:57:37.875000 211467 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3932750Z [rank3]:E1204 11:57:37.875000 211467 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3933030Z [rank3]:E1204 11:57:37.875000 211467 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3933180Z [rank3]:E1204 11:57:37.875000 211467 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3933702Z [rank3]:E1204 11:57:37.875000 211467 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 2243952640 and is now 3454009344.
2025-12-04T11:59:23.3933818Z [rank3]:E1204 11:57:37.875000 211467 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3934015Z [rank3]:E1204 11:57:37.875000 211467 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3934454Z [rank3]:E1204 11:57:37.875000 211467 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda
2025-12-04T11:59:23.3934571Z [rank3]:E1204 11:57:37.875000 211467 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3934786Z [rank3]:E1204 11:57:37.875000 211467 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3934954Z [rank3]:E1204 11:57:37.875000 211467 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T11:59:23.3934993Z dist init r=3, world=4
2025-12-04T11:59:23.3935133Z [rank1]:E1204 11:57:37.882000 211465 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3935294Z [rank1]:E1204 11:57:37.882000 211465 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3935587Z [rank1]:E1204 11:57:37.882000 211465 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3935746Z [rank1]:E1204 11:57:37.882000 211465 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3936032Z [rank1]:E1204 11:57:37.882000 211465 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3936159Z [rank1]:E1204 11:57:37.882000 211465 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3936439Z [rank1]:E1204 11:57:37.882000 211465 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3936606Z [rank1]:E1204 11:57:37.882000 211465 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3936887Z [rank1]:E1204 11:57:37.882000 211465 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3937035Z [rank1]:E1204 11:57:37.882000 211465 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3937314Z [rank1]:E1204 11:57:37.882000 211465 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3937453Z [rank1]:E1204 11:57:37.882000 211465 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3937734Z [rank1]:E1204 11:57:37.882000 211465 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3937884Z [rank1]:E1204 11:57:37.882000 211465 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3938402Z [rank1]:E1204 11:57:37.882000 211465 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 2317352960 and is now 3521118208.
2025-12-04T11:59:23.3938539Z [rank1]:E1204 11:57:37.882000 211465 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3938739Z [rank1]:E1204 11:57:37.882000 211465 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3939149Z [rank1]:E1204 11:57:37.882000 211465 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda
2025-12-04T11:59:23.3939262Z [rank1]:E1204 11:57:37.882000 211465 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3939482Z [rank1]:E1204 11:57:37.882000 211465 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3939686Z [rank1]:E1204 11:57:37.882000 211465 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T11:59:23.3939726Z dist init r=1, world=4
2025-12-04T11:59:23.3939864Z [rank2]:E1204 11:57:37.901000 211466 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3940031Z [rank2]:E1204 11:57:37.901000 211466 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3940320Z [rank2]:E1204 11:57:37.901000 211466 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3940476Z [rank2]:E1204 11:57:37.901000 211466 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3940767Z [rank2]:E1204 11:57:37.901000 211466 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3940892Z [rank2]:E1204 11:57:37.901000 211466 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3941201Z [rank2]:E1204 11:57:37.901000 211466 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3941350Z [rank2]:E1204 11:57:37.901000 211466 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3941632Z [rank2]:E1204 11:57:37.901000 211466 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3941783Z [rank2]:E1204 11:57:37.901000 211466 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3942065Z [rank2]:E1204 11:57:37.901000 211466 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3942207Z [rank2]:E1204 11:57:37.901000 211466 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3942485Z [rank2]:E1204 11:57:37.901000 211466 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3942662Z [rank2]:E1204 11:57:37.901000 211466 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3943179Z [rank2]:E1204 11:57:37.901000 211466 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 2300575744 and is now 3504340992.
2025-12-04T11:59:23.3943297Z [rank2]:E1204 11:57:37.901000 211466 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3943497Z [rank2]:E1204 11:57:37.901000 211466 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3943905Z [rank2]:E1204 11:57:37.901000 211466 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda
2025-12-04T11:59:23.3944021Z [rank2]:E1204 11:57:37.901000 211466 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3944234Z [rank2]:E1204 11:57:37.901000 211466 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3944401Z [rank2]:E1204 11:57:37.901000 211466 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T11:59:23.3944440Z dist init r=2, world=4
2025-12-04T11:59:23.3944579Z [rank0]:E1204 11:57:37.964000 211464 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3944740Z [rank0]:E1204 11:57:37.964000 211464 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3945036Z [rank0]:E1204 11:57:37.964000 211464 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3945192Z [rank0]:E1204 11:57:37.964000 211464 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3945497Z [rank0]:E1204 11:57:37.964000 211464 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3945625Z [rank0]:E1204 11:57:37.964000 211464 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3945906Z [rank0]:E1204 11:57:37.964000 211464 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3946059Z [rank0]:E1204 11:57:37.964000 211464 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3946339Z [rank0]:E1204 11:57:37.964000 211464 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3946489Z [rank0]:E1204 11:57:37.964000 211464 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3946770Z [rank0]:E1204 11:57:37.964000 211464 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3946908Z [rank0]:E1204 11:57:37.964000 211464 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3947208Z [rank0]:E1204 11:57:37.964000 211464 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3947357Z [rank0]:E1204 11:57:37.964000 211464 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3947882Z [rank0]:E1204 11:57:37.964000 211464 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3072 on device 0. CUDA driver allocated memory was 2459959296 and is now 3663724544.
2025-12-04T11:59:23.3947994Z [rank0]:E1204 11:57:37.964000 211464 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3948195Z [rank0]:E1204 11:57:37.964000 211464 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3948605Z [rank0]:E1204 11:57:37.964000 211464 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda
2025-12-04T11:59:23.3948718Z [rank0]:E1204 11:57:37.964000 211464 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3948934Z [rank0]:E1204 11:57:37.964000 211464 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3949097Z [rank0]:E1204 11:57:37.964000 211464 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T11:59:23.3949136Z dist init r=0, world=4
2025-12-04T11:59:23.3949477Z [rank0]:[W1204 11:57:38.333104418 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T11:59:23.3949518Z FAILED [9.0195s] [100%]
2025-12-04T11:59:23.3949539Z 
2025-12-04T11:59:23.3949631Z =================================== FAILURES ===================================
2025-12-04T11:59:23.3949768Z _ TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda _
2025-12-04T11:59:23.3949815Z Traceback (most recent call last):
2025-12-04T11:59:23.3949978Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T11:59:23.3950023Z     self._join_processes(fn)
2025-12-04T11:59:23.3950197Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T11:59:23.3950252Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T11:59:23.3950430Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T11:59:23.3950475Z     raise RuntimeError(error)
2025-12-04T11:59:23.3950555Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T11:59:23.3950600Z Traceback (most recent call last):
2025-12-04T11:59:23.3950763Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3950807Z     getattr(self, test_name)()
2025-12-04T11:59:23.3950967Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3951038Z     fn()
2025-12-04T11:59:23.3951191Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3951230Z     method(*args, **kwargs)
2025-12-04T11:59:23.3951384Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3951423Z     method(*args, **kwargs)
2025-12-04T11:59:23.3951580Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3951616Z     with policy():
2025-12-04T11:59:23.3951770Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3951810Z     raise RuntimeError(msg)
2025-12-04T11:59:23.3952202Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 2243952640 and is now 3454009344.
2025-12-04T11:59:23.3952206Z 
2025-12-04T11:59:23.3952281Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3952563Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda
2025-12-04T11:59:23.3952566Z 
2025-12-04T11:59:23.3952656Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3952658Z 
2025-12-04T11:59:23.3952660Z 
2025-12-04T11:59:23.3952735Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:59:23.3952825Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T11:59:23.3953077Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-d90c84a1d19c859a.xml -
2025-12-04T11:59:23.3953138Z =========================== short test summary info ============================
2025-12-04T11:59:23.3953458Z FAILED [9.0195s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T11:59:23.3953503Z Traceback (most recent call last):
2025-12-04T11:59:23.3953669Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3953713Z     getattr(self, test_name)()
2025-12-04T11:59:23.3953874Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3953911Z     fn()
2025-12-04T11:59:23.3954063Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3954103Z     method(*args, **kwargs)
2025-12-04T11:59:23.3954255Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3954297Z     method(*args, **kwargs)
2025-12-04T11:59:23.3954450Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3954486Z     with policy():
2025-12-04T11:59:23.3954643Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3954683Z     raise RuntimeError(msg)
2025-12-04T11:59:23.3955073Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 2243952640 and is now 3454009344.
2025-12-04T11:59:23.3955095Z 
2025-12-04T11:59:23.3955169Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3955453Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda
2025-12-04T11:59:23.3955454Z 
2025-12-04T11:59:23.3955543Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3955610Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:59:23.3955672Z ======================= 1 failed, 7 deselected in 9.03s ========================
2025-12-04T11:59:23.3955710Z Got exit code 1
2025-12-04T11:59:23.3955937Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda
2025-12-04T11:59:23.3956069Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T11:59:23.3956276Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-303f589842289b36.xml
2025-12-04T11:59:23.3956334Z ============================= test session starts ==============================
2025-12-04T11:59:23.3956449Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:59:23.3956489Z cachedir: .pytest_cache
2025-12-04T11:59:23.3956652Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:59:23.3956698Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T11:59:23.3956739Z configfile: pytest.ini
2025-12-04T11:59:23.3956906Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:59:23.3956978Z collecting ... collected 8 items / 5 deselected / 3 selected
2025-12-04T11:59:23.3957031Z stepcurrent: skipping 5 already run items.
2025-12-04T11:59:23.3957075Z Running 3 items in this shard
2025-12-04T11:59:23.3957077Z 
2025-12-04T11:59:23.3957462Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda I1204 11:57:42.383000 211797 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 211866
2025-12-04T11:59:23.3957622Z I1204 11:57:42.384000 211797 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 211867
2025-12-04T11:59:23.3957777Z I1204 11:57:42.384000 211797 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 211868
2025-12-04T11:59:23.3957934Z I1204 11:57:42.385000 211797 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 211869
2025-12-04T11:59:23.3958441Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3958503Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3958999Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3959082Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3959624Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3959685Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3960177Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3960235Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3960381Z [rank1]:E1204 11:57:49.491000 211867 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3960548Z [rank1]:E1204 11:57:49.491000 211867 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3960842Z [rank1]:E1204 11:57:49.491000 211867 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3961000Z [rank1]:E1204 11:57:49.491000 211867 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3961290Z [rank1]:E1204 11:57:49.491000 211867 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3961416Z [rank1]:E1204 11:57:49.491000 211867 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3961698Z [rank1]:E1204 11:57:49.491000 211867 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3961871Z [rank1]:E1204 11:57:49.491000 211867 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3962152Z [rank1]:E1204 11:57:49.491000 211867 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3962300Z [rank1]:E1204 11:57:49.491000 211867 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3962584Z [rank1]:E1204 11:57:49.491000 211867 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3962723Z [rank1]:E1204 11:57:49.491000 211867 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3963007Z [rank1]:E1204 11:57:49.491000 211867 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3963157Z [rank1]:E1204 11:57:49.491000 211867 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3963678Z [rank1]:E1204 11:57:49.491000 211867 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 2317352960 and is now 3521118208.
2025-12-04T11:59:23.3963822Z [rank1]:E1204 11:57:49.491000 211867 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3964020Z [rank1]:E1204 11:57:49.491000 211867 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3964430Z [rank1]:E1204 11:57:49.491000 211867 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda
2025-12-04T11:59:23.3964545Z [rank1]:E1204 11:57:49.491000 211867 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3964759Z [rank1]:E1204 11:57:49.491000 211867 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3964928Z [rank1]:E1204 11:57:49.491000 211867 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T11:59:23.3964966Z dist init r=1, world=4
2025-12-04T11:59:23.3965108Z [rank3]:E1204 11:57:49.500000 211869 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3965269Z [rank3]:E1204 11:57:49.500000 211869 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3965559Z [rank3]:E1204 11:57:49.500000 211869 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3965715Z [rank3]:E1204 11:57:49.500000 211869 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3966005Z [rank3]:E1204 11:57:49.500000 211869 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3966152Z [rank3]:E1204 11:57:49.500000 211869 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3966432Z [rank3]:E1204 11:57:49.500000 211869 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3966580Z [rank3]:E1204 11:57:49.500000 211869 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3966860Z [rank3]:E1204 11:57:49.500000 211869 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3967009Z [rank3]:E1204 11:57:49.500000 211869 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3967288Z [rank3]:E1204 11:57:49.500000 211869 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3967428Z [rank3]:E1204 11:57:49.500000 211869 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3967708Z [rank3]:E1204 11:57:49.500000 211869 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3967875Z [rank3]:E1204 11:57:49.500000 211869 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3968398Z [rank3]:E1204 11:57:49.500000 211869 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 2250244096 and is now 3454009344.
2025-12-04T11:59:23.3968511Z [rank3]:E1204 11:57:49.500000 211869 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3968709Z [rank3]:E1204 11:57:49.500000 211869 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3969117Z [rank3]:E1204 11:57:49.500000 211869 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda
2025-12-04T11:59:23.3969229Z [rank3]:E1204 11:57:49.500000 211869 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3969444Z [rank3]:E1204 11:57:49.500000 211869 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3969650Z [rank3]:E1204 11:57:49.500000 211869 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T11:59:23.3969688Z dist init r=3, world=4
2025-12-04T11:59:23.3969826Z [rank0]:E1204 11:57:49.565000 211866 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3969993Z [rank0]:E1204 11:57:49.565000 211866 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3970281Z [rank0]:E1204 11:57:49.565000 211866 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3970459Z [rank0]:E1204 11:57:49.565000 211866 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3970747Z [rank0]:E1204 11:57:49.565000 211866 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3970872Z [rank0]:E1204 11:57:49.565000 211866 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3971155Z [rank0]:E1204 11:57:49.565000 211866 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3971302Z [rank0]:E1204 11:57:49.565000 211866 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3971585Z [rank0]:E1204 11:57:49.565000 211866 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3971732Z [rank0]:E1204 11:57:49.565000 211866 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3972014Z [rank0]:E1204 11:57:49.565000 211866 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3972178Z [rank0]:E1204 11:57:49.565000 211866 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3972459Z [rank0]:E1204 11:57:49.565000 211866 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3972611Z [rank0]:E1204 11:57:49.565000 211866 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3973127Z [rank0]:E1204 11:57:49.565000 211866 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3072 on device 0. CUDA driver allocated memory was 2459959296 and is now 3663724544.
2025-12-04T11:59:23.3973243Z [rank0]:E1204 11:57:49.565000 211866 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3973439Z [rank0]:E1204 11:57:49.565000 211866 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3973848Z [rank0]:E1204 11:57:49.565000 211866 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda
2025-12-04T11:59:23.3973960Z [rank0]:E1204 11:57:49.565000 211866 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3974174Z [rank0]:E1204 11:57:49.565000 211866 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3974344Z [rank0]:E1204 11:57:49.565000 211866 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T11:59:23.3974381Z dist init r=0, world=4
2025-12-04T11:59:23.3974519Z [rank2]:E1204 11:57:49.588000 211868 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3974698Z [rank2]:E1204 11:57:49.588000 211868 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3974991Z [rank2]:E1204 11:57:49.588000 211868 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3975144Z [rank2]:E1204 11:57:49.588000 211868 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3975435Z [rank2]:E1204 11:57:49.588000 211868 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3975558Z [rank2]:E1204 11:57:49.588000 211868 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3975845Z [rank2]:E1204 11:57:49.588000 211868 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3975994Z [rank2]:E1204 11:57:49.588000 211868 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3976275Z [rank2]:E1204 11:57:49.588000 211868 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3976442Z [rank2]:E1204 11:57:49.588000 211868 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3976724Z [rank2]:E1204 11:57:49.588000 211868 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3976864Z [rank2]:E1204 11:57:49.588000 211868 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3977146Z [rank2]:E1204 11:57:49.588000 211868 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3977298Z [rank2]:E1204 11:57:49.588000 211868 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3977818Z [rank2]:E1204 11:57:49.588000 211868 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 2300575744 and is now 3504340992.
2025-12-04T11:59:23.3977933Z [rank2]:E1204 11:57:49.588000 211868 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3978130Z [rank2]:E1204 11:57:49.588000 211868 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3978537Z [rank2]:E1204 11:57:49.588000 211868 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda
2025-12-04T11:59:23.3978653Z [rank2]:E1204 11:57:49.588000 211868 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3978865Z [rank2]:E1204 11:57:49.588000 211868 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3979175Z [rank2]:E1204 11:57:49.588000 211868 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T11:59:23.3979213Z dist init r=2, world=4
2025-12-04T11:59:23.3979556Z [rank0]:[W1204 11:57:49.902602146 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T11:59:23.3979641Z FAILED [8.9192s] [ 33%]
2025-12-04T11:59:23.3979645Z 
2025-12-04T11:59:23.3979700Z =================================== FAILURES ===================================
2025-12-04T11:59:23.3979835Z _ TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda _
2025-12-04T11:59:23.3979880Z Traceback (most recent call last):
2025-12-04T11:59:23.3980045Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T11:59:23.3980087Z     self._join_processes(fn)
2025-12-04T11:59:23.3980265Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T11:59:23.3980318Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T11:59:23.3980497Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T11:59:23.3980540Z     raise RuntimeError(error)
2025-12-04T11:59:23.3980649Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T11:59:23.3980693Z Traceback (most recent call last):
2025-12-04T11:59:23.3980856Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3980898Z     getattr(self, test_name)()
2025-12-04T11:59:23.3981058Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3981097Z     fn()
2025-12-04T11:59:23.3981250Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3981290Z     method(*args, **kwargs)
2025-12-04T11:59:23.3981444Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3981483Z     method(*args, **kwargs)
2025-12-04T11:59:23.3981635Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3981678Z     with policy():
2025-12-04T11:59:23.3981833Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3981872Z     raise RuntimeError(msg)
2025-12-04T11:59:23.3982269Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 2317352960 and is now 3521118208.
2025-12-04T11:59:23.3982272Z 
2025-12-04T11:59:23.3982347Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3982630Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda
2025-12-04T11:59:23.3982637Z 
2025-12-04T11:59:23.3982725Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3982727Z 
2025-12-04T11:59:23.3982729Z 
2025-12-04T11:59:23.3982803Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:59:23.3982890Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T11:59:23.3983170Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-303f589842289b36.xml -
2025-12-04T11:59:23.3983230Z =========================== short test summary info ============================
2025-12-04T11:59:23.3983524Z FAILED [8.9192s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T11:59:23.3983571Z Traceback (most recent call last):
2025-12-04T11:59:23.3983737Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3983779Z     getattr(self, test_name)()
2025-12-04T11:59:23.3983941Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3983975Z     fn()
2025-12-04T11:59:23.3984129Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3984169Z     method(*args, **kwargs)
2025-12-04T11:59:23.3984321Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3984360Z     method(*args, **kwargs)
2025-12-04T11:59:23.3984513Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3984573Z     with policy():
2025-12-04T11:59:23.3984726Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3984767Z     raise RuntimeError(msg)
2025-12-04T11:59:23.3985157Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 2317352960 and is now 3521118208.
2025-12-04T11:59:23.3985159Z 
2025-12-04T11:59:23.3985235Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3985517Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda
2025-12-04T11:59:23.3985521Z 
2025-12-04T11:59:23.3985608Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3985671Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:59:23.3985733Z ======================= 1 failed, 5 deselected in 8.93s ========================
2025-12-04T11:59:23.3985768Z Got exit code 1
2025-12-04T11:59:23.3985808Z Retrying single test...
2025-12-04T11:59:23.3986017Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-f15b22de1b66815c.xml
2025-12-04T11:59:23.3986074Z ============================= test session starts ==============================
2025-12-04T11:59:23.3986189Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:59:23.3986228Z cachedir: .pytest_cache
2025-12-04T11:59:23.3986391Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:59:23.3986437Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T11:59:23.3986477Z configfile: pytest.ini
2025-12-04T11:59:23.3986640Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:59:23.3986711Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T11:59:23.3987019Z stepcurrent: skipping 5 already run items. Running only test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda
2025-12-04T11:59:23.3987062Z Running 1 items in this shard
2025-12-04T11:59:23.3987064Z 
2025-12-04T11:59:23.3987416Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda I1204 11:57:54.005000 212199 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 212268
2025-12-04T11:59:23.3987575Z I1204 11:57:54.006000 212199 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 212269
2025-12-04T11:59:23.3987730Z I1204 11:57:54.006000 212199 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 212270
2025-12-04T11:59:23.3987884Z I1204 11:57:54.007000 212199 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 212271
2025-12-04T11:59:23.3988390Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3988451Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3988975Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3989034Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3989529Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3989625Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3990119Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.3990175Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.3990323Z [rank3]:E1204 11:58:01.186000 212271 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3990492Z [rank3]:E1204 11:58:01.186000 212271 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3990786Z [rank3]:E1204 11:58:01.186000 212271 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3990946Z [rank3]:E1204 11:58:01.186000 212271 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3991235Z [rank3]:E1204 11:58:01.186000 212271 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3991384Z [rank3]:E1204 11:58:01.186000 212271 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3991665Z [rank3]:E1204 11:58:01.186000 212271 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3991813Z [rank3]:E1204 11:58:01.186000 212271 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3992094Z [rank3]:E1204 11:58:01.186000 212271 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3992241Z [rank3]:E1204 11:58:01.186000 212271 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3992521Z [rank3]:E1204 11:58:01.186000 212271 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3992658Z [rank3]:E1204 11:58:01.186000 212271 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3992937Z [rank3]:E1204 11:58:01.186000 212271 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3993119Z [rank3]:E1204 11:58:01.186000 212271 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3993652Z [rank3]:E1204 11:58:01.186000 212271 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 2250244096 and is now 3454009344.
2025-12-04T11:59:23.3993768Z [rank3]:E1204 11:58:01.186000 212271 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3993965Z [rank3]:E1204 11:58:01.186000 212271 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3994374Z [rank3]:E1204 11:58:01.186000 212271 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda
2025-12-04T11:59:23.3994490Z [rank3]:E1204 11:58:01.186000 212271 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3994704Z [rank3]:E1204 11:58:01.186000 212271 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3994872Z [rank3]:E1204 11:58:01.186000 212271 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T11:59:23.3994911Z dist init r=3, world=4
2025-12-04T11:59:23.3995049Z [rank2]:E1204 11:58:01.194000 212270 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3995211Z [rank2]:E1204 11:58:01.194000 212270 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.3995505Z [rank2]:E1204 11:58:01.194000 212270 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.3995678Z [rank2]:E1204 11:58:01.194000 212270 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.3995966Z [rank2]:E1204 11:58:01.194000 212270 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.3996092Z [rank2]:E1204 11:58:01.194000 212270 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.3996372Z [rank2]:E1204 11:58:01.194000 212270 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3996520Z [rank2]:E1204 11:58:01.194000 212270 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3996802Z [rank2]:E1204 11:58:01.194000 212270 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.3996950Z [rank2]:E1204 11:58:01.194000 212270 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.3997228Z [rank2]:E1204 11:58:01.194000 212270 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.3997392Z [rank2]:E1204 11:58:01.194000 212270 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.3997672Z [rank2]:E1204 11:58:01.194000 212270 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.3997821Z [rank2]:E1204 11:58:01.194000 212270 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.3998340Z [rank2]:E1204 11:58:01.194000 212270 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 2300575744 and is now 3504340992.
2025-12-04T11:59:23.3998455Z [rank2]:E1204 11:58:01.194000 212270 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3998654Z [rank2]:E1204 11:58:01.194000 212270 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.3999061Z [rank2]:E1204 11:58:01.194000 212270 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda
2025-12-04T11:59:23.3999173Z [rank2]:E1204 11:58:01.194000 212270 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.3999388Z [rank2]:E1204 11:58:01.194000 212270 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.3999557Z [rank2]:E1204 11:58:01.194000 212270 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T11:59:23.3999634Z dist init r=2, world=4
2025-12-04T11:59:23.3999772Z [rank0]:E1204 11:58:01.263000 212268 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.3999957Z [rank0]:E1204 11:58:01.263000 212268 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.4000247Z [rank0]:E1204 11:58:01.263000 212268 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.4000404Z [rank0]:E1204 11:58:01.263000 212268 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.4000697Z [rank0]:E1204 11:58:01.263000 212268 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.4000823Z [rank0]:E1204 11:58:01.263000 212268 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.4001105Z [rank0]:E1204 11:58:01.263000 212268 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4001254Z [rank0]:E1204 11:58:01.263000 212268 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4001534Z [rank0]:E1204 11:58:01.263000 212268 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4001707Z [rank0]:E1204 11:58:01.263000 212268 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4001987Z [rank0]:E1204 11:58:01.263000 212268 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.4002125Z [rank0]:E1204 11:58:01.263000 212268 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.4002406Z [rank0]:E1204 11:58:01.263000 212268 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.4002555Z [rank0]:E1204 11:58:01.263000 212268 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.4003074Z [rank0]:E1204 11:58:01.263000 212268 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3072 on device 0. CUDA driver allocated memory was 2459959296 and is now 3663724544.
2025-12-04T11:59:23.4003193Z [rank0]:E1204 11:58:01.263000 212268 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4003392Z [rank0]:E1204 11:58:01.263000 212268 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.4003801Z [rank0]:E1204 11:58:01.263000 212268 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda
2025-12-04T11:59:23.4003917Z [rank0]:E1204 11:58:01.263000 212268 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4004131Z [rank0]:E1204 11:58:01.263000 212268 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.4004316Z [rank0]:E1204 11:58:01.263000 212268 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T11:59:23.4004356Z dist init r=0, world=4
2025-12-04T11:59:23.4004493Z [rank1]:E1204 11:58:01.265000 212269 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.4004653Z [rank1]:E1204 11:58:01.265000 212269 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.4004944Z [rank1]:E1204 11:58:01.265000 212269 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.4005098Z [rank1]:E1204 11:58:01.265000 212269 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.4005390Z [rank1]:E1204 11:58:01.265000 212269 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.4005515Z [rank1]:E1204 11:58:01.265000 212269 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.4005793Z [rank1]:E1204 11:58:01.265000 212269 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4005962Z [rank1]:E1204 11:58:01.265000 212269 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4006240Z [rank1]:E1204 11:58:01.265000 212269 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4006389Z [rank1]:E1204 11:58:01.265000 212269 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4006667Z [rank1]:E1204 11:58:01.265000 212269 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.4006805Z [rank1]:E1204 11:58:01.265000 212269 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.4007087Z [rank1]:E1204 11:58:01.265000 212269 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.4007236Z [rank1]:E1204 11:58:01.265000 212269 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.4007754Z [rank1]:E1204 11:58:01.265000 212269 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 2317352960 and is now 3521118208.
2025-12-04T11:59:23.4007868Z [rank1]:E1204 11:58:01.265000 212269 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4008065Z [rank1]:E1204 11:58:01.265000 212269 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.4008473Z [rank1]:E1204 11:58:01.265000 212269 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda
2025-12-04T11:59:23.4008604Z [rank1]:E1204 11:58:01.265000 212269 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4008818Z [rank1]:E1204 11:58:01.265000 212269 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.4008982Z [rank1]:E1204 11:58:01.265000 212269 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T11:59:23.4009019Z dist init r=1, world=4
2025-12-04T11:59:23.4009361Z [rank0]:[W1204 11:58:01.568085854 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T11:59:23.4009399Z FAILED [9.0188s] [100%]
2025-12-04T11:59:23.4009401Z 
2025-12-04T11:59:23.4009456Z =================================== FAILURES ===================================
2025-12-04T11:59:23.4009630Z _ TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda _
2025-12-04T11:59:23.4009675Z Traceback (most recent call last):
2025-12-04T11:59:23.4009839Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T11:59:23.4009882Z     self._join_processes(fn)
2025-12-04T11:59:23.4010056Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T11:59:23.4010139Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T11:59:23.4010317Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T11:59:23.4010361Z     raise RuntimeError(error)
2025-12-04T11:59:23.4010438Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T11:59:23.4010484Z Traceback (most recent call last):
2025-12-04T11:59:23.4010645Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.4010687Z     getattr(self, test_name)()
2025-12-04T11:59:23.4010848Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.4010883Z     fn()
2025-12-04T11:59:23.4011033Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4011078Z     method(*args, **kwargs)
2025-12-04T11:59:23.4011229Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4011268Z     method(*args, **kwargs)
2025-12-04T11:59:23.4011420Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.4011456Z     with policy():
2025-12-04T11:59:23.4011613Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.4011655Z     raise RuntimeError(msg)
2025-12-04T11:59:23.4012054Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 2250244096 and is now 3454009344.
2025-12-04T11:59:23.4012058Z 
2025-12-04T11:59:23.4012134Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.4012414Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda
2025-12-04T11:59:23.4012416Z 
2025-12-04T11:59:23.4012525Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.4012527Z 
2025-12-04T11:59:23.4012529Z 
2025-12-04T11:59:23.4012604Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:59:23.4012691Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T11:59:23.4012939Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-f15b22de1b66815c.xml -
2025-12-04T11:59:23.4012999Z =========================== short test summary info ============================
2025-12-04T11:59:23.4013293Z FAILED [9.0188s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T11:59:23.4013336Z Traceback (most recent call last):
2025-12-04T11:59:23.4013503Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.4013544Z     getattr(self, test_name)()
2025-12-04T11:59:23.4013706Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.4013742Z     fn()
2025-12-04T11:59:23.4013893Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4013962Z     method(*args, **kwargs)
2025-12-04T11:59:23.4014114Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4014152Z     method(*args, **kwargs)
2025-12-04T11:59:23.4014302Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.4014340Z     with policy():
2025-12-04T11:59:23.4014494Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.4014536Z     raise RuntimeError(msg)
2025-12-04T11:59:23.4014928Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 2250244096 and is now 3454009344.
2025-12-04T11:59:23.4014931Z 
2025-12-04T11:59:23.4015005Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.4015283Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda
2025-12-04T11:59:23.4015285Z 
2025-12-04T11:59:23.4015371Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.4015434Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:59:23.4015496Z ======================= 1 failed, 7 deselected in 9.03s ========================
2025-12-04T11:59:23.4015534Z Got exit code 1
2025-12-04T11:59:23.4015572Z Retrying single test...
2025-12-04T11:59:23.4015780Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-d8c3dbf45df3e6f1.xml
2025-12-04T11:59:23.4015839Z ============================= test session starts ==============================
2025-12-04T11:59:23.4015952Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:59:23.4015991Z cachedir: .pytest_cache
2025-12-04T11:59:23.4016152Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:59:23.4016196Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T11:59:23.4016258Z configfile: pytest.ini
2025-12-04T11:59:23.4016422Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:59:23.4016494Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T11:59:23.4016766Z stepcurrent: skipping 5 already run items. Running only test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda
2025-12-04T11:59:23.4016810Z Running 1 items in this shard
2025-12-04T11:59:23.4016813Z 
2025-12-04T11:59:23.4017169Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda I1204 11:58:05.660000 212601 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 212670
2025-12-04T11:59:23.4017328Z I1204 11:58:05.661000 212601 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 212671
2025-12-04T11:59:23.4017483Z I1204 11:58:05.662000 212601 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 212672
2025-12-04T11:59:23.4017634Z I1204 11:58:05.662000 212601 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 212673
2025-12-04T11:59:23.4018131Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.4018217Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.4018716Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.4018775Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.4019275Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.4019334Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.4019859Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.4019916Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.4020057Z [rank0]:E1204 11:58:12.865000 212670 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.4020222Z [rank0]:E1204 11:58:12.865000 212670 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.4020516Z [rank0]:E1204 11:58:12.865000 212670 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.4020701Z [rank0]:E1204 11:58:12.865000 212670 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.4020992Z [rank0]:E1204 11:58:12.865000 212670 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.4021117Z [rank0]:E1204 11:58:12.865000 212670 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.4021397Z [rank0]:E1204 11:58:12.865000 212670 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4021546Z [rank0]:E1204 11:58:12.865000 212670 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4021827Z [rank0]:E1204 11:58:12.865000 212670 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4021974Z [rank0]:E1204 11:58:12.865000 212670 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4022253Z [rank0]:E1204 11:58:12.865000 212670 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.4022418Z [rank0]:E1204 11:58:12.865000 212670 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.4022699Z [rank0]:E1204 11:58:12.865000 212670 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.4022851Z [rank0]:E1204 11:58:12.865000 212670 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.4023370Z [rank0]:E1204 11:58:12.865000 212670 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3072 on device 0. CUDA driver allocated memory was 2459959296 and is now 3663724544.
2025-12-04T11:59:23.4023488Z [rank0]:E1204 11:58:12.865000 212670 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4023688Z [rank0]:E1204 11:58:12.865000 212670 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.4024100Z [rank0]:E1204 11:58:12.865000 212670 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda
2025-12-04T11:59:23.4024216Z [rank0]:E1204 11:58:12.865000 212670 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4024427Z [rank0]:E1204 11:58:12.865000 212670 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.4024597Z [rank0]:E1204 11:58:12.865000 212670 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T11:59:23.4024634Z dist init r=0, world=4
2025-12-04T11:59:23.4024774Z [rank3]:E1204 11:58:12.880000 212673 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.4024952Z [rank3]:E1204 11:58:12.880000 212673 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.4025243Z [rank3]:E1204 11:58:12.880000 212673 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.4025396Z [rank3]:E1204 11:58:12.880000 212673 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.4025682Z [rank3]:E1204 11:58:12.880000 212673 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.4025809Z [rank3]:E1204 11:58:12.880000 212673 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.4026093Z [rank3]:E1204 11:58:12.880000 212673 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4026243Z [rank3]:E1204 11:58:12.880000 212673 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4026523Z [rank3]:E1204 11:58:12.880000 212673 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4026688Z [rank3]:E1204 11:58:12.880000 212673 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4026969Z [rank3]:E1204 11:58:12.880000 212673 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.4027106Z [rank3]:E1204 11:58:12.880000 212673 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.4027386Z [rank3]:E1204 11:58:12.880000 212673 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.4027535Z [rank3]:E1204 11:58:12.880000 212673 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.4028054Z [rank3]:E1204 11:58:12.880000 212673 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 2250244096 and is now 3454009344.
2025-12-04T11:59:23.4028172Z [rank3]:E1204 11:58:12.880000 212673 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4028370Z [rank3]:E1204 11:58:12.880000 212673 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.4028781Z [rank3]:E1204 11:58:12.880000 212673 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda
2025-12-04T11:59:23.4028895Z [rank3]:E1204 11:58:12.880000 212673 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4029109Z [rank3]:E1204 11:58:12.880000 212673 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.4029307Z [rank3]:E1204 11:58:12.880000 212673 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T11:59:23.4029347Z dist init r=3, world=4
2025-12-04T11:59:23.4029483Z [rank1]:E1204 11:58:12.887000 212671 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.4029684Z [rank1]:E1204 11:58:12.887000 212671 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.4029975Z [rank1]:E1204 11:58:12.887000 212671 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.4030132Z [rank1]:E1204 11:58:12.887000 212671 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.4030425Z [rank1]:E1204 11:58:12.887000 212671 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.4030548Z [rank1]:E1204 11:58:12.887000 212671 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.4030829Z [rank1]:E1204 11:58:12.887000 212671 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4030980Z [rank1]:E1204 11:58:12.887000 212671 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4031287Z [rank1]:E1204 11:58:12.887000 212671 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4031436Z [rank1]:E1204 11:58:12.887000 212671 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4031715Z [rank1]:E1204 11:58:12.887000 212671 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.4031858Z [rank1]:E1204 11:58:12.887000 212671 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.4032138Z [rank1]:E1204 11:58:12.887000 212671 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.4032288Z [rank1]:E1204 11:58:12.887000 212671 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.4032804Z [rank1]:E1204 11:58:12.887000 212671 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 2317352960 and is now 3521118208.
2025-12-04T11:59:23.4032919Z [rank1]:E1204 11:58:12.887000 212671 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4033116Z [rank1]:E1204 11:58:12.887000 212671 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.4033531Z [rank1]:E1204 11:58:12.887000 212671 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda
2025-12-04T11:59:23.4033667Z [rank1]:E1204 11:58:12.887000 212671 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4033879Z [rank1]:E1204 11:58:12.887000 212671 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.4034046Z [rank1]:E1204 11:58:12.887000 212671 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T11:59:23.4034083Z dist init r=1, world=4
2025-12-04T11:59:23.4034224Z [rank2]:E1204 11:58:12.892000 212672 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.4034386Z [rank2]:E1204 11:58:12.892000 212672 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.4034677Z [rank2]:E1204 11:58:12.892000 212672 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.4034834Z [rank2]:E1204 11:58:12.892000 212672 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.4035122Z [rank2]:E1204 11:58:12.892000 212672 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.4035248Z [rank2]:E1204 11:58:12.892000 212672 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.4035543Z [rank2]:E1204 11:58:12.892000 212672 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4035695Z [rank2]:E1204 11:58:12.892000 212672 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4035975Z [rank2]:E1204 11:58:12.892000 212672 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4036122Z [rank2]:E1204 11:58:12.892000 212672 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4036398Z [rank2]:E1204 11:58:12.892000 212672 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.4036537Z [rank2]:E1204 11:58:12.892000 212672 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.4036826Z [rank2]:E1204 11:58:12.892000 212672 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.4036973Z [rank2]:E1204 11:58:12.892000 212672 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.4037489Z [rank2]:E1204 11:58:12.892000 212672 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 2300575744 and is now 3504340992.
2025-12-04T11:59:23.4037605Z [rank2]:E1204 11:58:12.892000 212672 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4037800Z [rank2]:E1204 11:58:12.892000 212672 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.4038226Z [rank2]:E1204 11:58:12.892000 212672 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda
2025-12-04T11:59:23.4038338Z [rank2]:E1204 11:58:12.892000 212672 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4038551Z [rank2]:E1204 11:58:12.892000 212672 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.4038717Z [rank2]:E1204 11:58:12.892000 212672 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T11:59:23.4038754Z dist init r=2, world=4
2025-12-04T11:59:23.4039094Z [rank0]:[W1204 11:58:13.031702599 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T11:59:23.4039132Z FAILED [9.1209s] [100%]
2025-12-04T11:59:23.4039134Z 
2025-12-04T11:59:23.4039188Z =================================== FAILURES ===================================
2025-12-04T11:59:23.4039323Z _ TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda _
2025-12-04T11:59:23.4039367Z Traceback (most recent call last):
2025-12-04T11:59:23.4039553Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T11:59:23.4039626Z     self._join_processes(fn)
2025-12-04T11:59:23.4039799Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T11:59:23.4039854Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T11:59:23.4040034Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T11:59:23.4040076Z     raise RuntimeError(error)
2025-12-04T11:59:23.4040156Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T11:59:23.4040202Z Traceback (most recent call last):
2025-12-04T11:59:23.4040364Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.4040408Z     getattr(self, test_name)()
2025-12-04T11:59:23.4040567Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.4040601Z     fn()
2025-12-04T11:59:23.4040751Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4040791Z     method(*args, **kwargs)
2025-12-04T11:59:23.4040944Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4040985Z     method(*args, **kwargs)
2025-12-04T11:59:23.4041138Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.4041176Z     with policy():
2025-12-04T11:59:23.4041330Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.4041373Z     raise RuntimeError(msg)
2025-12-04T11:59:23.4041772Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3072 on device 0. CUDA driver allocated memory was 2459959296 and is now 3663724544.
2025-12-04T11:59:23.4041775Z 
2025-12-04T11:59:23.4041877Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.4042162Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda
2025-12-04T11:59:23.4042164Z 
2025-12-04T11:59:23.4042252Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.4042254Z 
2025-12-04T11:59:23.4042256Z 
2025-12-04T11:59:23.4042333Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:59:23.4042423Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T11:59:23.4042678Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-d8c3dbf45df3e6f1.xml -
2025-12-04T11:59:23.4042738Z =========================== short test summary info ============================
2025-12-04T11:59:23.4043038Z FAILED [9.1209s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T11:59:23.4043083Z Traceback (most recent call last):
2025-12-04T11:59:23.4043249Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.4043294Z     getattr(self, test_name)()
2025-12-04T11:59:23.4043481Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.4043517Z     fn()
2025-12-04T11:59:23.4043670Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4043711Z     method(*args, **kwargs)
2025-12-04T11:59:23.4043865Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4043905Z     method(*args, **kwargs)
2025-12-04T11:59:23.4044058Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.4044093Z     with policy():
2025-12-04T11:59:23.4044246Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.4044287Z     raise RuntimeError(msg)
2025-12-04T11:59:23.4044680Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3072 on device 0. CUDA driver allocated memory was 2459959296 and is now 3663724544.
2025-12-04T11:59:23.4044682Z 
2025-12-04T11:59:23.4044757Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.4045039Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda
2025-12-04T11:59:23.4045043Z 
2025-12-04T11:59:23.4045129Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.4045195Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:59:23.4045256Z ======================= 1 failed, 7 deselected in 9.13s ========================
2025-12-04T11:59:23.4045297Z Got exit code 1
2025-12-04T11:59:23.4045525Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda
2025-12-04T11:59:23.4045655Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T11:59:23.4045888Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-407fc34d7680a29b.xml
2025-12-04T11:59:23.4045947Z ============================= test session starts ==============================
2025-12-04T11:59:23.4046058Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:59:23.4046099Z cachedir: .pytest_cache
2025-12-04T11:59:23.4046261Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:59:23.4046308Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T11:59:23.4046347Z configfile: pytest.ini
2025-12-04T11:59:23.4046511Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:59:23.4046582Z collecting ... collected 8 items / 6 deselected / 2 selected
2025-12-04T11:59:23.4046635Z stepcurrent: skipping 6 already run items.
2025-12-04T11:59:23.4046678Z Running 2 items in this shard
2025-12-04T11:59:23.4046680Z 
2025-12-04T11:59:23.4046986Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy0_cuda I1204 11:58:17.322000 213003 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 213072
2025-12-04T11:59:23.4047143Z I1204 11:58:17.323000 213003 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 213073
2025-12-04T11:59:23.4047318Z I1204 11:58:17.323000 213003 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 213074
2025-12-04T11:59:23.4047471Z I1204 11:58:17.324000 213003 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 213075
2025-12-04T11:59:23.4047978Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.4048041Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.4048537Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.4048599Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.4049098Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.4049154Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.4049689Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.4049746Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.4049893Z [rank2]:E1204 11:58:24.491000 213074 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.4050082Z [rank2]:E1204 11:58:24.491000 213074 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.4050378Z [rank2]:E1204 11:58:24.491000 213074 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.4050537Z [rank2]:E1204 11:58:24.491000 213074 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.4050828Z [rank2]:E1204 11:58:24.491000 213074 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.4050955Z [rank2]:E1204 11:58:24.491000 213074 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.4051238Z [rank2]:E1204 11:58:24.491000 213074 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4051389Z [rank2]:E1204 11:58:24.491000 213074 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4051668Z [rank2]:E1204 11:58:24.491000 213074 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4051840Z [rank2]:E1204 11:58:24.491000 213074 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4052119Z [rank2]:E1204 11:58:24.491000 213074 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.4052258Z [rank2]:E1204 11:58:24.491000 213074 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.4052541Z [rank2]:E1204 11:58:24.491000 213074 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.4052689Z [rank2]:E1204 11:58:24.491000 213074 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.4053172Z [rank2]:E1204 11:58:24.491000 213074 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 2300575744 and is now 3506438144.
2025-12-04T11:59:23.4053292Z [rank2]:E1204 11:58:24.491000 213074 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4053487Z [rank2]:E1204 11:58:24.491000 213074 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.4053843Z [rank2]:E1204 11:58:24.491000 213074 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda
2025-12-04T11:59:23.4053959Z [rank2]:E1204 11:58:24.491000 213074 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4054172Z [rank2]:E1204 11:58:24.491000 213074 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.4054360Z [rank2]:E1204 11:58:24.491000 213074 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T11:59:23.4054399Z dist init r=2, world=4
2025-12-04T11:59:23.4054536Z [rank1]:E1204 11:58:24.498000 213073 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.4054696Z [rank1]:E1204 11:58:24.498000 213073 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.4054983Z [rank1]:E1204 11:58:24.498000 213073 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.4055139Z [rank1]:E1204 11:58:24.498000 213073 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.4055429Z [rank1]:E1204 11:58:24.498000 213073 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.4055554Z [rank1]:E1204 11:58:24.498000 213073 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.4055833Z [rank1]:E1204 11:58:24.498000 213073 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4055980Z [rank1]:E1204 11:58:24.498000 213073 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4056281Z [rank1]:E1204 11:58:24.498000 213073 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4056428Z [rank1]:E1204 11:58:24.498000 213073 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4056710Z [rank1]:E1204 11:58:24.498000 213073 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.4056849Z [rank1]:E1204 11:58:24.498000 213073 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.4057131Z [rank1]:E1204 11:58:24.498000 213073 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.4057283Z [rank1]:E1204 11:58:24.498000 213073 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.4057753Z [rank1]:E1204 11:58:24.498000 213073 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 2317352960 and is now 3523215360.
2025-12-04T11:59:23.4057868Z [rank1]:E1204 11:58:24.498000 213073 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4058065Z [rank1]:E1204 11:58:24.498000 213073 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.4058419Z [rank1]:E1204 11:58:24.498000 213073 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda
2025-12-04T11:59:23.4058535Z [rank1]:E1204 11:58:24.498000 213073 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4058765Z [rank1]:E1204 11:58:24.498000 213073 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.4058933Z [rank1]:E1204 11:58:24.498000 213073 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T11:59:23.4058972Z dist init r=1, world=4
2025-12-04T11:59:23.4059111Z [rank3]:E1204 11:58:24.504000 213075 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.4059273Z [rank3]:E1204 11:58:24.504000 213075 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.4059566Z [rank3]:E1204 11:58:24.504000 213075 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.4059815Z [rank3]:E1204 11:58:24.504000 213075 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.4060100Z [rank3]:E1204 11:58:24.504000 213075 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.4060225Z [rank3]:E1204 11:58:24.504000 213075 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.4060537Z [rank3]:E1204 11:58:24.504000 213075 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4060685Z [rank3]:E1204 11:58:24.504000 213075 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4060965Z [rank3]:E1204 11:58:24.504000 213075 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4061115Z [rank3]:E1204 11:58:24.504000 213075 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4061393Z [rank3]:E1204 11:58:24.504000 213075 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.4061535Z [rank3]:E1204 11:58:24.504000 213075 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.4061821Z [rank3]:E1204 11:58:24.504000 213075 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.4061973Z [rank3]:E1204 11:58:24.504000 213075 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.4062441Z [rank3]:E1204 11:58:24.504000 213075 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 2250244096 and is now 3456106496.
2025-12-04T11:59:23.4062556Z [rank3]:E1204 11:58:24.504000 213075 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4062755Z [rank3]:E1204 11:58:24.504000 213075 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.4063135Z [rank3]:E1204 11:58:24.504000 213075 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda
2025-12-04T11:59:23.4063248Z [rank3]:E1204 11:58:24.504000 213075 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4063462Z [rank3]:E1204 11:58:24.504000 213075 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.4063628Z [rank3]:E1204 11:58:24.504000 213075 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T11:59:23.4063670Z dist init r=3, world=4
2025-12-04T11:59:23.4063807Z [rank0]:E1204 11:58:24.572000 213072 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.4063972Z [rank0]:E1204 11:58:24.572000 213072 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.4064264Z [rank0]:E1204 11:58:24.572000 213072 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.4064421Z [rank0]:E1204 11:58:24.572000 213072 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.4064709Z [rank0]:E1204 11:58:24.572000 213072 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.4064850Z [rank0]:E1204 11:58:24.572000 213072 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.4065131Z [rank0]:E1204 11:58:24.572000 213072 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4065279Z [rank0]:E1204 11:58:24.572000 213072 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4065560Z [rank0]:E1204 11:58:24.572000 213072 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4065706Z [rank0]:E1204 11:58:24.572000 213072 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4065987Z [rank0]:E1204 11:58:24.572000 213072 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.4066124Z [rank0]:E1204 11:58:24.572000 213072 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.4066405Z [rank0]:E1204 11:58:24.572000 213072 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.4066555Z [rank0]:E1204 11:58:24.572000 213072 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.4067020Z [rank0]:E1204 11:58:24.572000 213072 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 0. CUDA driver allocated memory was 2459959296 and is now 3665821696.
2025-12-04T11:59:23.4067134Z [rank0]:E1204 11:58:24.572000 213072 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4067347Z [rank0]:E1204 11:58:24.572000 213072 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.4067697Z [rank0]:E1204 11:58:24.572000 213072 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda
2025-12-04T11:59:23.4067809Z [rank0]:E1204 11:58:24.572000 213072 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4068023Z [rank0]:E1204 11:58:24.572000 213072 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.4068189Z [rank0]:E1204 11:58:24.572000 213072 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T11:59:23.4068226Z dist init r=0, world=4
2025-12-04T11:59:23.4068264Z FAILED [8.3162s] [ 50%]
2025-12-04T11:59:23.4068268Z 
2025-12-04T11:59:23.4068323Z =================================== FAILURES ===================================
2025-12-04T11:59:23.4068418Z ________ TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda _________
2025-12-04T11:59:23.4068464Z Traceback (most recent call last):
2025-12-04T11:59:23.4068629Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T11:59:23.4068690Z     self._join_processes(fn)
2025-12-04T11:59:23.4068863Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T11:59:23.4068916Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T11:59:23.4069098Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T11:59:23.4069140Z     raise RuntimeError(error)
2025-12-04T11:59:23.4069222Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T11:59:23.4069265Z Traceback (most recent call last):
2025-12-04T11:59:23.4069429Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.4069470Z     getattr(self, test_name)()
2025-12-04T11:59:23.4069664Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.4069701Z     fn()
2025-12-04T11:59:23.4069853Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4069893Z     method(*args, **kwargs)
2025-12-04T11:59:23.4070045Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4070085Z     method(*args, **kwargs)
2025-12-04T11:59:23.4070237Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.4070274Z     with policy():
2025-12-04T11:59:23.4070427Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.4070468Z     raise RuntimeError(msg)
2025-12-04T11:59:23.4070811Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 2317352960 and is now 3523215360.
2025-12-04T11:59:23.4070815Z 
2025-12-04T11:59:23.4070891Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.4071120Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda
2025-12-04T11:59:23.4071146Z 
2025-12-04T11:59:23.4071234Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.4071237Z 
2025-12-04T11:59:23.4071238Z 
2025-12-04T11:59:23.4071314Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:59:23.4071401Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T11:59:23.4071656Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-407fc34d7680a29b.xml -
2025-12-04T11:59:23.4071717Z =========================== short test summary info ============================
2025-12-04T11:59:23.4071959Z FAILED [8.3162s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy0_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T11:59:23.4072002Z Traceback (most recent call last):
2025-12-04T11:59:23.4072171Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.4072211Z     getattr(self, test_name)()
2025-12-04T11:59:23.4072375Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.4072408Z     fn()
2025-12-04T11:59:23.4072561Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4072624Z     method(*args, **kwargs)
2025-12-04T11:59:23.4072776Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4072814Z     method(*args, **kwargs)
2025-12-04T11:59:23.4072966Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.4073003Z     with policy():
2025-12-04T11:59:23.4073160Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.4073200Z     raise RuntimeError(msg)
2025-12-04T11:59:23.4073543Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 2317352960 and is now 3523215360.
2025-12-04T11:59:23.4073547Z 
2025-12-04T11:59:23.4073621Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.4073846Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda
2025-12-04T11:59:23.4073848Z 
2025-12-04T11:59:23.4073936Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.4073999Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:59:23.4074061Z ======================= 1 failed, 6 deselected in 8.33s ========================
2025-12-04T11:59:23.4074096Z Got exit code 1
2025-12-04T11:59:23.4074135Z Retrying single test...
2025-12-04T11:59:23.4074342Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-46c3c722696407a3.xml
2025-12-04T11:59:23.4074399Z ============================= test session starts ==============================
2025-12-04T11:59:23.4074513Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:59:23.4074554Z cachedir: .pytest_cache
2025-12-04T11:59:23.4074713Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:59:23.4074760Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T11:59:23.4074798Z configfile: pytest.ini
2025-12-04T11:59:23.4074986Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:59:23.4075058Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T11:59:23.4075276Z stepcurrent: skipping 6 already run items. Running only test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy0_cuda
2025-12-04T11:59:23.4075319Z Running 1 items in this shard
2025-12-04T11:59:23.4075323Z 
2025-12-04T11:59:23.4075627Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy0_cuda I1204 11:58:28.202000 213397 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 213466
2025-12-04T11:59:23.4075786Z I1204 11:58:28.203000 213397 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 213467
2025-12-04T11:59:23.4075941Z I1204 11:58:28.203000 213397 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 213468
2025-12-04T11:59:23.4076092Z I1204 11:58:28.204000 213397 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 213469
2025-12-04T11:59:23.4076596Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.4076683Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.4077181Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.4077239Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.4077732Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.4077790Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.4078283Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.4078339Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.4078483Z [rank3]:E1204 11:58:35.392000 213469 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.4078646Z [rank3]:E1204 11:58:35.392000 213469 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.4078943Z [rank3]:E1204 11:58:35.392000 213469 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.4079099Z [rank3]:E1204 11:58:35.392000 213469 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.4079405Z [rank3]:E1204 11:58:35.392000 213469 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.4079532Z [rank3]:E1204 11:58:35.392000 213469 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.4079843Z [rank3]:E1204 11:58:35.392000 213469 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4079994Z [rank3]:E1204 11:58:35.392000 213469 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4080274Z [rank3]:E1204 11:58:35.392000 213469 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4080423Z [rank3]:E1204 11:58:35.392000 213469 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4080704Z [rank3]:E1204 11:58:35.392000 213469 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.4080841Z [rank3]:E1204 11:58:35.392000 213469 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.4081146Z [rank3]:E1204 11:58:35.392000 213469 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.4081294Z [rank3]:E1204 11:58:35.392000 213469 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.4081768Z [rank3]:E1204 11:58:35.392000 213469 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 2250244096 and is now 3456106496.
2025-12-04T11:59:23.4081884Z [rank3]:E1204 11:58:35.392000 213469 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4082086Z [rank3]:E1204 11:58:35.392000 213469 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.4082440Z [rank3]:E1204 11:58:35.392000 213469 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda
2025-12-04T11:59:23.4082554Z [rank3]:E1204 11:58:35.392000 213469 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4082766Z [rank3]:E1204 11:58:35.392000 213469 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.4082934Z [rank3]:E1204 11:58:35.392000 213469 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T11:59:23.4082973Z dist init r=3, world=4
2025-12-04T11:59:23.4083113Z [rank1]:E1204 11:58:35.395000 213467 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.4083273Z [rank1]:E1204 11:58:35.395000 213467 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.4083587Z [rank1]:E1204 11:58:35.395000 213467 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.4083740Z [rank1]:E1204 11:58:35.395000 213467 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.4084027Z [rank1]:E1204 11:58:35.395000 213467 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.4084152Z [rank1]:E1204 11:58:35.395000 213467 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.4084431Z [rank1]:E1204 11:58:35.395000 213467 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4084581Z [rank1]:E1204 11:58:35.395000 213467 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4084859Z [rank1]:E1204 11:58:35.395000 213467 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4085007Z [rank1]:E1204 11:58:35.395000 213467 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4085286Z [rank1]:E1204 11:58:35.395000 213467 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.4085442Z [rank1]:E1204 11:58:35.395000 213467 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.4085722Z [rank1]:E1204 11:58:35.395000 213467 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.4085871Z [rank1]:E1204 11:58:35.395000 213467 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.4086338Z [rank1]:E1204 11:58:35.395000 213467 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 2317352960 and is now 3523215360.
2025-12-04T11:59:23.4086452Z [rank1]:E1204 11:58:35.395000 213467 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4086653Z [rank1]:E1204 11:58:35.395000 213467 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.4087006Z [rank1]:E1204 11:58:35.395000 213467 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda
2025-12-04T11:59:23.4087121Z [rank1]:E1204 11:58:35.395000 213467 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4087339Z [rank1]:E1204 11:58:35.395000 213467 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.4087509Z [rank1]:E1204 11:58:35.395000 213467 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T11:59:23.4087547Z dist init r=1, world=4
2025-12-04T11:59:23.4087685Z [rank2]:E1204 11:58:35.458000 213468 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.4087865Z [rank2]:E1204 11:58:35.458000 213468 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.4088156Z [rank2]:E1204 11:58:35.458000 213468 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.4088313Z [rank2]:E1204 11:58:35.458000 213468 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.4088599Z [rank2]:E1204 11:58:35.458000 213468 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.4088725Z [rank2]:E1204 11:58:35.458000 213468 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.4089009Z [rank2]:E1204 11:58:35.458000 213468 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4089158Z [rank2]:E1204 11:58:35.458000 213468 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4089440Z [rank2]:E1204 11:58:35.458000 213468 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4089664Z [rank2]:E1204 11:58:35.458000 213468 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4089945Z [rank2]:E1204 11:58:35.458000 213468 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.4090084Z [rank2]:E1204 11:58:35.458000 213468 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.4090365Z [rank2]:E1204 11:58:35.458000 213468 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.4090515Z [rank2]:E1204 11:58:35.458000 213468 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.4090987Z [rank2]:E1204 11:58:35.458000 213468 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 2300575744 and is now 3506438144.
2025-12-04T11:59:23.4091103Z [rank2]:E1204 11:58:35.458000 213468 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4091301Z [rank2]:E1204 11:58:35.458000 213468 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.4091655Z [rank2]:E1204 11:58:35.458000 213468 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda
2025-12-04T11:59:23.4091769Z [rank2]:E1204 11:58:35.458000 213468 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4091985Z [rank2]:E1204 11:58:35.458000 213468 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.4092180Z [rank2]:E1204 11:58:35.458000 213468 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T11:59:23.4092220Z dist init r=2, world=4
2025-12-04T11:59:23.4092357Z [rank0]:E1204 11:58:35.464000 213466 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.4092520Z [rank0]:E1204 11:58:35.464000 213466 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.4092811Z [rank0]:E1204 11:58:35.464000 213466 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.4092967Z [rank0]:E1204 11:58:35.464000 213466 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.4093258Z [rank0]:E1204 11:58:35.464000 213466 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.4093381Z [rank0]:E1204 11:58:35.464000 213466 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.4093662Z [rank0]:E1204 11:58:35.464000 213466 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4093834Z [rank0]:E1204 11:58:35.464000 213466 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4094114Z [rank0]:E1204 11:58:35.464000 213466 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4094262Z [rank0]:E1204 11:58:35.464000 213466 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4094540Z [rank0]:E1204 11:58:35.464000 213466 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.4094677Z [rank0]:E1204 11:58:35.464000 213466 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.4094959Z [rank0]:E1204 11:58:35.464000 213466 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.4095112Z [rank0]:E1204 11:58:35.464000 213466 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.4095581Z [rank0]:E1204 11:58:35.464000 213466 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 0. CUDA driver allocated memory was 2459959296 and is now 3665821696.
2025-12-04T11:59:23.4095696Z [rank0]:E1204 11:58:35.464000 213466 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4095895Z [rank0]:E1204 11:58:35.464000 213466 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.4096248Z [rank0]:E1204 11:58:35.464000 213466 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda
2025-12-04T11:59:23.4096360Z [rank0]:E1204 11:58:35.464000 213466 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4096589Z [rank0]:E1204 11:58:35.464000 213466 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.4096756Z [rank0]:E1204 11:58:35.464000 213466 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T11:59:23.4096792Z dist init r=0, world=4
2025-12-04T11:59:23.4096830Z FAILED [8.4182s] [100%]
2025-12-04T11:59:23.4096832Z 
2025-12-04T11:59:23.4096888Z =================================== FAILURES ===================================
2025-12-04T11:59:23.4096985Z ________ TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda _________
2025-12-04T11:59:23.4097029Z Traceback (most recent call last):
2025-12-04T11:59:23.4097191Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T11:59:23.4097233Z     self._join_processes(fn)
2025-12-04T11:59:23.4097408Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T11:59:23.4097461Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T11:59:23.4097640Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T11:59:23.4097684Z     raise RuntimeError(error)
2025-12-04T11:59:23.4097762Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T11:59:23.4097825Z Traceback (most recent call last):
2025-12-04T11:59:23.4097987Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.4098028Z     getattr(self, test_name)()
2025-12-04T11:59:23.4098188Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.4098222Z     fn()
2025-12-04T11:59:23.4098376Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4098416Z     method(*args, **kwargs)
2025-12-04T11:59:23.4098567Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4098606Z     method(*args, **kwargs)
2025-12-04T11:59:23.4098758Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.4098797Z     with policy():
2025-12-04T11:59:23.4098950Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.4098991Z     raise RuntimeError(msg)
2025-12-04T11:59:23.4099341Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 2317352960 and is now 3523215360.
2025-12-04T11:59:23.4099343Z 
2025-12-04T11:59:23.4099418Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.4099686Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda
2025-12-04T11:59:23.4099689Z 
2025-12-04T11:59:23.4099776Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.4099780Z 
2025-12-04T11:59:23.4099840Z Process 3 exited with error code 10 and exception:
2025-12-04T11:59:23.4099883Z Traceback (most recent call last):
2025-12-04T11:59:23.4100048Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.4100088Z     getattr(self, test_name)()
2025-12-04T11:59:23.4100275Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.4100309Z     fn()
2025-12-04T11:59:23.4100461Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4100499Z     method(*args, **kwargs)
2025-12-04T11:59:23.4100652Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4100691Z     method(*args, **kwargs)
2025-12-04T11:59:23.4100843Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.4100878Z     with policy():
2025-12-04T11:59:23.4101032Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.4101072Z     raise RuntimeError(msg)
2025-12-04T11:59:23.4101414Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 2250244096 and is now 3456106496.
2025-12-04T11:59:23.4101416Z 
2025-12-04T11:59:23.4101491Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.4101718Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda
2025-12-04T11:59:23.4101744Z 
2025-12-04T11:59:23.4101831Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.4101834Z 
2025-12-04T11:59:23.4101835Z 
2025-12-04T11:59:23.4101909Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:59:23.4101997Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T11:59:23.4102249Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-46c3c722696407a3.xml -
2025-12-04T11:59:23.4102310Z =========================== short test summary info ============================
2025-12-04T11:59:23.4102552Z FAILED [8.4182s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy0_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T11:59:23.4102599Z Traceback (most recent call last):
2025-12-04T11:59:23.4102766Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.4102807Z     getattr(self, test_name)()
2025-12-04T11:59:23.4102971Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.4103005Z     fn()
2025-12-04T11:59:23.4103159Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4103197Z     method(*args, **kwargs)
2025-12-04T11:59:23.4103351Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4103389Z     method(*args, **kwargs)
2025-12-04T11:59:23.4103540Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.4103579Z     with policy():
2025-12-04T11:59:23.4103732Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.4103772Z     raise RuntimeError(msg)
2025-12-04T11:59:23.4104131Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 2317352960 and is now 3523215360.
2025-12-04T11:59:23.4104133Z 
2025-12-04T11:59:23.4104204Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.4104427Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda
2025-12-04T11:59:23.4104429Z 
2025-12-04T11:59:23.4104517Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.4104521Z 
2025-12-04T11:59:23.4104580Z Process 3 exited with error code 10 and exception:
2025-12-04T11:59:23.4104623Z Traceback (most recent call last):
2025-12-04T11:59:23.4104785Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.4104827Z     getattr(self, test_name)()
2025-12-04T11:59:23.4104990Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.4105023Z     fn()
2025-12-04T11:59:23.4105174Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4105214Z     method(*args, **kwargs)
2025-12-04T11:59:23.4105365Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4105423Z     method(*args, **kwargs)
2025-12-04T11:59:23.4105576Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.4105612Z     with policy():
2025-12-04T11:59:23.4105766Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.4105806Z     raise RuntimeError(msg)
2025-12-04T11:59:23.4106149Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 2250244096 and is now 3456106496.
2025-12-04T11:59:23.4106151Z 
2025-12-04T11:59:23.4106224Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.4106447Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda
2025-12-04T11:59:23.4106452Z 
2025-12-04T11:59:23.4106538Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.4106601Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:59:23.4106662Z ======================= 1 failed, 7 deselected in 8.43s ========================
2025-12-04T11:59:23.4106698Z Got exit code 1
2025-12-04T11:59:23.4106738Z Retrying single test...
2025-12-04T11:59:23.4106947Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-3760c1e6f8841a9d.xml
2025-12-04T11:59:23.4107003Z ============================= test session starts ==============================
2025-12-04T11:59:23.4107120Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:59:23.4107159Z cachedir: .pytest_cache
2025-12-04T11:59:23.4107320Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:59:23.4107367Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T11:59:23.4107406Z configfile: pytest.ini
2025-12-04T11:59:23.4107570Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:59:23.4107641Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T11:59:23.4107882Z stepcurrent: skipping 6 already run items. Running only test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy0_cuda
2025-12-04T11:59:23.4107925Z Running 1 items in this shard
2025-12-04T11:59:23.4107927Z 
2025-12-04T11:59:23.4108228Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy0_cuda I1204 11:58:39.310000 213791 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 213860
2025-12-04T11:59:23.4108387Z I1204 11:58:39.311000 213791 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 213861
2025-12-04T11:59:23.4108540Z I1204 11:58:39.311000 213791 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 213862
2025-12-04T11:59:23.4108692Z I1204 11:58:39.312000 213791 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 213863
2025-12-04T11:59:23.4109198Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.4109259Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.4109815Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.4109875Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.4110371Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.4110428Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.4110920Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.4110978Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.4111124Z [rank1]:E1204 11:58:46.353000 213861 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.4111292Z [rank1]:E1204 11:58:46.353000 213861 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.4111587Z [rank1]:E1204 11:58:46.353000 213861 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.4111747Z [rank1]:E1204 11:58:46.353000 213861 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.4112038Z [rank1]:E1204 11:58:46.353000 213861 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.4112193Z [rank1]:E1204 11:58:46.353000 213861 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.4112475Z [rank1]:E1204 11:58:46.353000 213861 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4112624Z [rank1]:E1204 11:58:46.353000 213861 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4112909Z [rank1]:E1204 11:58:46.353000 213861 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4113058Z [rank1]:E1204 11:58:46.353000 213861 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4113339Z [rank1]:E1204 11:58:46.353000 213861 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.4113476Z [rank1]:E1204 11:58:46.353000 213861 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.4113758Z [rank1]:E1204 11:58:46.353000 213861 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.4113935Z [rank1]:E1204 11:58:46.353000 213861 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.4114403Z [rank1]:E1204 11:58:46.353000 213861 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 2317352960 and is now 3523215360.
2025-12-04T11:59:23.4114520Z [rank1]:E1204 11:58:46.353000 213861 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4114721Z [rank1]:E1204 11:58:46.353000 213861 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.4115072Z [rank1]:E1204 11:58:46.353000 213861 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda
2025-12-04T11:59:23.4115188Z [rank1]:E1204 11:58:46.353000 213861 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4115402Z [rank1]:E1204 11:58:46.353000 213861 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.4115569Z [rank1]:E1204 11:58:46.353000 213861 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T11:59:23.4115606Z dist init r=1, world=4
2025-12-04T11:59:23.4115744Z [rank3]:E1204 11:58:46.361000 213863 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.4115904Z [rank3]:E1204 11:58:46.361000 213863 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.4116196Z [rank3]:E1204 11:58:46.361000 213863 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.4116351Z [rank3]:E1204 11:58:46.361000 213863 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.4116654Z [rank3]:E1204 11:58:46.361000 213863 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.4116779Z [rank3]:E1204 11:58:46.361000 213863 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.4117061Z [rank3]:E1204 11:58:46.361000 213863 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4117211Z [rank3]:E1204 11:58:46.361000 213863 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4117493Z [rank3]:E1204 11:58:46.361000 213863 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4117644Z [rank3]:E1204 11:58:46.361000 213863 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4117926Z [rank3]:E1204 11:58:46.361000 213863 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.4118062Z [rank3]:E1204 11:58:46.361000 213863 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.4118363Z [rank3]:E1204 11:58:46.361000 213863 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.4118511Z [rank3]:E1204 11:58:46.361000 213863 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.4118980Z [rank3]:E1204 11:58:46.361000 213863 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 2250244096 and is now 3456106496.
2025-12-04T11:59:23.4119094Z [rank3]:E1204 11:58:46.361000 213863 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4119294Z [rank3]:E1204 11:58:46.361000 213863 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.4119682Z [rank3]:E1204 11:58:46.361000 213863 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda
2025-12-04T11:59:23.4119795Z [rank3]:E1204 11:58:46.361000 213863 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4120009Z [rank3]:E1204 11:58:46.361000 213863 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.4120174Z [rank3]:E1204 11:58:46.361000 213863 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T11:59:23.4120213Z dist init r=3, world=4
2025-12-04T11:59:23.4120351Z [rank2]:E1204 11:58:46.406000 213862 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.4120513Z [rank2]:E1204 11:58:46.406000 213862 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.4120826Z [rank2]:E1204 11:58:46.406000 213862 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.4120979Z [rank2]:E1204 11:58:46.406000 213862 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.4121266Z [rank2]:E1204 11:58:46.406000 213862 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.4121390Z [rank2]:E1204 11:58:46.406000 213862 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.4121677Z [rank2]:E1204 11:58:46.406000 213862 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4121828Z [rank2]:E1204 11:58:46.406000 213862 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4122106Z [rank2]:E1204 11:58:46.406000 213862 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4122252Z [rank2]:E1204 11:58:46.406000 213862 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4122561Z [rank2]:E1204 11:58:46.406000 213862 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.4122698Z [rank2]:E1204 11:58:46.406000 213862 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.4122979Z [rank2]:E1204 11:58:46.406000 213862 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.4123128Z [rank2]:E1204 11:58:46.406000 213862 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.4123594Z [rank2]:E1204 11:58:46.406000 213862 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 2300575744 and is now 3506438144.
2025-12-04T11:59:23.4123711Z [rank2]:E1204 11:58:46.406000 213862 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4123907Z [rank2]:E1204 11:58:46.406000 213862 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.4124262Z [rank2]:E1204 11:58:46.406000 213862 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda
2025-12-04T11:59:23.4124375Z [rank2]:E1204 11:58:46.406000 213862 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4124587Z [rank2]:E1204 11:58:46.406000 213862 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.4124753Z [rank2]:E1204 11:58:46.406000 213862 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T11:59:23.4124790Z dist init r=2, world=4
2025-12-04T11:59:23.4124927Z [rank0]:E1204 11:58:46.409000 213860 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.4125105Z [rank0]:E1204 11:58:46.409000 213860 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.4125397Z [rank0]:E1204 11:58:46.409000 213860 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.4125550Z [rank0]:E1204 11:58:46.409000 213860 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.4125839Z [rank0]:E1204 11:58:46.409000 213860 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.4125962Z [rank0]:E1204 11:58:46.409000 213860 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.4126244Z [rank0]:E1204 11:58:46.409000 213860 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4126394Z [rank0]:E1204 11:58:46.409000 213860 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4126671Z [rank0]:E1204 11:58:46.409000 213860 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4126838Z [rank0]:E1204 11:58:46.409000 213860 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4127118Z [rank0]:E1204 11:58:46.409000 213860 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.4127256Z [rank0]:E1204 11:58:46.409000 213860 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.4127536Z [rank0]:E1204 11:58:46.409000 213860 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.4127684Z [rank0]:E1204 11:58:46.409000 213860 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.4128153Z [rank0]:E1204 11:58:46.409000 213860 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 0. CUDA driver allocated memory was 2459959296 and is now 3665821696.
2025-12-04T11:59:23.4128267Z [rank0]:E1204 11:58:46.409000 213860 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4128466Z [rank0]:E1204 11:58:46.409000 213860 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.4128817Z [rank0]:E1204 11:58:46.409000 213860 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda
2025-12-04T11:59:23.4128932Z [rank0]:E1204 11:58:46.409000 213860 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4129146Z [rank0]:E1204 11:58:46.409000 213860 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.4129327Z [rank0]:E1204 11:58:46.409000 213860 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T11:59:23.4129366Z dist init r=0, world=4
2025-12-04T11:59:23.4129404Z FAILED [8.3182s] [100%]
2025-12-04T11:59:23.4129406Z 
2025-12-04T11:59:23.4129461Z =================================== FAILURES ===================================
2025-12-04T11:59:23.4129554Z ________ TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda _________
2025-12-04T11:59:23.4129643Z Traceback (most recent call last):
2025-12-04T11:59:23.4129809Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T11:59:23.4129853Z     self._join_processes(fn)
2025-12-04T11:59:23.4130027Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T11:59:23.4130082Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T11:59:23.4130263Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T11:59:23.4130305Z     raise RuntimeError(error)
2025-12-04T11:59:23.4130384Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T11:59:23.4130429Z Traceback (most recent call last):
2025-12-04T11:59:23.4130589Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.4130656Z     getattr(self, test_name)()
2025-12-04T11:59:23.4130816Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.4130852Z     fn()
2025-12-04T11:59:23.4131003Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4131043Z     method(*args, **kwargs)
2025-12-04T11:59:23.4131196Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4131236Z     method(*args, **kwargs)
2025-12-04T11:59:23.4131391Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.4131426Z     with policy():
2025-12-04T11:59:23.4131583Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.4131624Z     raise RuntimeError(msg)
2025-12-04T11:59:23.4131967Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 2317352960 and is now 3523215360.
2025-12-04T11:59:23.4131969Z 
2025-12-04T11:59:23.4132043Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.4132270Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda
2025-12-04T11:59:23.4132272Z 
2025-12-04T11:59:23.4132358Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.4132360Z 
2025-12-04T11:59:23.4132419Z Process 3 exited with error code 10 and exception:
2025-12-04T11:59:23.4132462Z Traceback (most recent call last):
2025-12-04T11:59:23.4132628Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.4132669Z     getattr(self, test_name)()
2025-12-04T11:59:23.4132831Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.4132866Z     fn()
2025-12-04T11:59:23.4133044Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4133164Z     method(*args, **kwargs)
2025-12-04T11:59:23.4133320Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4133360Z     method(*args, **kwargs)
2025-12-04T11:59:23.4133513Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.4133549Z     with policy():
2025-12-04T11:59:23.4133705Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.4133746Z     raise RuntimeError(msg)
2025-12-04T11:59:23.4134083Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 2250244096 and is now 3456106496.
2025-12-04T11:59:23.4134087Z 
2025-12-04T11:59:23.4134162Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.4134385Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda
2025-12-04T11:59:23.4134387Z 
2025-12-04T11:59:23.4134474Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.4134476Z 
2025-12-04T11:59:23.4134502Z 
2025-12-04T11:59:23.4134576Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:59:23.4134662Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T11:59:23.4134913Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-3760c1e6f8841a9d.xml -
2025-12-04T11:59:23.4134973Z =========================== short test summary info ============================
2025-12-04T11:59:23.4135217Z FAILED [8.3182s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy0_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T11:59:23.4135261Z Traceback (most recent call last):
2025-12-04T11:59:23.4135427Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.4135468Z     getattr(self, test_name)()
2025-12-04T11:59:23.4135636Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.4135669Z     fn()
2025-12-04T11:59:23.4135824Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4135862Z     method(*args, **kwargs)
2025-12-04T11:59:23.4136016Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4136054Z     method(*args, **kwargs)
2025-12-04T11:59:23.4136206Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.4136241Z     with policy():
2025-12-04T11:59:23.4136393Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.4136433Z     raise RuntimeError(msg)
2025-12-04T11:59:23.4136774Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 2317352960 and is now 3523215360.
2025-12-04T11:59:23.4136776Z 
2025-12-04T11:59:23.4136849Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.4137091Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda
2025-12-04T11:59:23.4137093Z 
2025-12-04T11:59:23.4137180Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.4137182Z 
2025-12-04T11:59:23.4137239Z Process 3 exited with error code 10 and exception:
2025-12-04T11:59:23.4137284Z Traceback (most recent call last):
2025-12-04T11:59:23.4137446Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.4137489Z     getattr(self, test_name)()
2025-12-04T11:59:23.4137650Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.4137683Z     fn()
2025-12-04T11:59:23.4137837Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4137877Z     method(*args, **kwargs)
2025-12-04T11:59:23.4138030Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4138068Z     method(*args, **kwargs)
2025-12-04T11:59:23.4138223Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.4140096Z     with policy():
2025-12-04T11:59:23.4140262Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.4140356Z     raise RuntimeError(msg)
2025-12-04T11:59:23.4140708Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 2250244096 and is now 3456106496.
2025-12-04T11:59:23.4140710Z 
2025-12-04T11:59:23.4140787Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.4141014Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda
2025-12-04T11:59:23.4141016Z 
2025-12-04T11:59:23.4141102Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.4141166Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:59:23.4141232Z ======================= 1 failed, 7 deselected in 8.33s ========================
2025-12-04T11:59:23.4141270Z Got exit code 1
2025-12-04T11:59:23.4141445Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy0_cuda
2025-12-04T11:59:23.4141574Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T11:59:23.4141784Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-e1a1f11b70cb3331.xml
2025-12-04T11:59:23.4141844Z ============================= test session starts ==============================
2025-12-04T11:59:23.4141959Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:59:23.4141999Z cachedir: .pytest_cache
2025-12-04T11:59:23.4142160Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:59:23.4142209Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T11:59:23.4142249Z configfile: pytest.ini
2025-12-04T11:59:23.4142415Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:59:23.4142486Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T11:59:23.4142575Z stepcurrent: skipping 7 already run items.
2025-12-04T11:59:23.4142619Z Running 1 items in this shard
2025-12-04T11:59:23.4142621Z 
2025-12-04T11:59:23.4142927Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy1_cuda I1204 11:58:50.031000 214185 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 214254
2025-12-04T11:59:23.4143084Z I1204 11:58:50.031000 214185 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 214255
2025-12-04T11:59:23.4143243Z I1204 11:58:50.032000 214185 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 214256
2025-12-04T11:59:23.4143395Z I1204 11:58:50.032000 214185 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 214257
2025-12-04T11:59:23.4143905Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.4143966Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.4144467Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.4144552Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.4145050Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.4145108Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.4145601Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.4145660Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.4145806Z [rank1]:E1204 11:58:57.230000 214255 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.4145973Z [rank1]:E1204 11:58:57.230000 214255 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.4146270Z [rank1]:E1204 11:58:57.230000 214255 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.4146429Z [rank1]:E1204 11:58:57.230000 214255 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.4146723Z [rank1]:E1204 11:58:57.230000 214255 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.4146851Z [rank1]:E1204 11:58:57.230000 214255 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.4147151Z [rank1]:E1204 11:58:57.230000 214255 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4147302Z [rank1]:E1204 11:58:57.230000 214255 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4147583Z [rank1]:E1204 11:58:57.230000 214255 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4147733Z [rank1]:E1204 11:58:57.230000 214255 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4148012Z [rank1]:E1204 11:58:57.230000 214255 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.4148149Z [rank1]:E1204 11:58:57.230000 214255 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.4148434Z [rank1]:E1204 11:58:57.230000 214255 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.4148583Z [rank1]:E1204 11:58:57.230000 214255 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.4149079Z [rank1]:E1204 11:58:57.230000 214255 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 4096 on device 1. CUDA driver allocated memory was 2317352960 and is now 3523215360.
2025-12-04T11:59:23.4149199Z [rank1]:E1204 11:58:57.230000 214255 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4149398Z [rank1]:E1204 11:58:57.230000 214255 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.4149808Z [rank1]:E1204 11:58:57.230000 214255 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda
2025-12-04T11:59:23.4149924Z [rank1]:E1204 11:58:57.230000 214255 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4150140Z [rank1]:E1204 11:58:57.230000 214255 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.4150307Z [rank1]:E1204 11:58:57.230000 214255 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T11:59:23.4150347Z dist init r=1, world=4
2025-12-04T11:59:23.4150485Z [rank2]:E1204 11:58:57.237000 214256 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.4150648Z [rank2]:E1204 11:58:57.237000 214256 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.4150939Z [rank2]:E1204 11:58:57.237000 214256 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.4151096Z [rank2]:E1204 11:58:57.237000 214256 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.4151417Z [rank2]:E1204 11:58:57.237000 214256 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.4151543Z [rank2]:E1204 11:58:57.237000 214256 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.4151823Z [rank2]:E1204 11:58:57.237000 214256 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4151974Z [rank2]:E1204 11:58:57.237000 214256 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4152253Z [rank2]:E1204 11:58:57.237000 214256 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4152404Z [rank2]:E1204 11:58:57.237000 214256 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4152683Z [rank2]:E1204 11:58:57.237000 214256 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.4152821Z [rank2]:E1204 11:58:57.237000 214256 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.4153101Z [rank2]:E1204 11:58:57.237000 214256 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.4153279Z [rank2]:E1204 11:58:57.237000 214256 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.4153749Z [rank2]:E1204 11:58:57.237000 214256 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 4096 on device 2. CUDA driver allocated memory was 2300575744 and is now 3506438144.
2025-12-04T11:59:23.4153866Z [rank2]:E1204 11:58:57.237000 214256 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4154065Z [rank2]:E1204 11:58:57.237000 214256 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.4154421Z [rank2]:E1204 11:58:57.237000 214256 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda
2025-12-04T11:59:23.4154535Z [rank2]:E1204 11:58:57.237000 214256 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4154749Z [rank2]:E1204 11:58:57.237000 214256 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.4154917Z [rank2]:E1204 11:58:57.237000 214256 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T11:59:23.4154954Z dist init r=2, world=4
2025-12-04T11:59:23.4155094Z [rank0]:E1204 11:58:57.238000 214254 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.4155256Z [rank0]:E1204 11:58:57.238000 214254 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.4155544Z [rank0]:E1204 11:58:57.238000 214254 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.4155728Z [rank0]:E1204 11:58:57.238000 214254 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.4156015Z [rank0]:E1204 11:58:57.238000 214254 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.4156139Z [rank0]:E1204 11:58:57.238000 214254 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.4156426Z [rank0]:E1204 11:58:57.238000 214254 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4156576Z [rank0]:E1204 11:58:57.238000 214254 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4156857Z [rank0]:E1204 11:58:57.238000 214254 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4157006Z [rank0]:E1204 11:58:57.238000 214254 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4157287Z [rank0]:E1204 11:58:57.238000 214254 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.4159062Z [rank0]:E1204 11:58:57.238000 214254 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.4159344Z [rank0]:E1204 11:58:57.238000 214254 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.4159496Z [rank0]:E1204 11:58:57.238000 214254 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.4160010Z [rank0]:E1204 11:58:57.238000 214254 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 4096 on device 0. CUDA driver allocated memory was 2459959296 and is now 3665821696.
2025-12-04T11:59:23.4160126Z [rank0]:E1204 11:58:57.238000 214254 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4160327Z [rank0]:E1204 11:58:57.238000 214254 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.4160685Z [rank0]:E1204 11:58:57.238000 214254 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda
2025-12-04T11:59:23.4160797Z [rank0]:E1204 11:58:57.238000 214254 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4161013Z [rank0]:E1204 11:58:57.238000 214254 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.4161179Z [rank0]:E1204 11:58:57.238000 214254 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T11:59:23.4161217Z dist init r=0, world=4
2025-12-04T11:59:23.4161354Z [rank3]:E1204 11:58:57.293000 214257 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.4161516Z [rank3]:E1204 11:58:57.293000 214257 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.4161836Z [rank3]:E1204 11:58:57.293000 214257 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.4161994Z [rank3]:E1204 11:58:57.293000 214257 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.4162282Z [rank3]:E1204 11:58:57.293000 214257 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.4162406Z [rank3]:E1204 11:58:57.293000 214257 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.4162691Z [rank3]:E1204 11:58:57.293000 214257 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4162838Z [rank3]:E1204 11:58:57.293000 214257 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4163119Z [rank3]:E1204 11:58:57.293000 214257 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4163266Z [rank3]:E1204 11:58:57.293000 214257 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4163573Z [rank3]:E1204 11:58:57.293000 214257 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.4163709Z [rank3]:E1204 11:58:57.293000 214257 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.4163991Z [rank3]:E1204 11:58:57.293000 214257 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.4164140Z [rank3]:E1204 11:58:57.293000 214257 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.4164612Z [rank3]:E1204 11:58:57.293000 214257 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 4096 on device 3. CUDA driver allocated memory was 2250244096 and is now 3456106496.
2025-12-04T11:59:23.4164727Z [rank3]:E1204 11:58:57.293000 214257 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4164926Z [rank3]:E1204 11:58:57.293000 214257 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.4165280Z [rank3]:E1204 11:58:57.293000 214257 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda
2025-12-04T11:59:23.4165392Z [rank3]:E1204 11:58:57.293000 214257 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4165606Z [rank3]:E1204 11:58:57.293000 214257 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.4165773Z [rank3]:E1204 11:58:57.293000 214257 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T11:59:23.4165809Z dist init r=3, world=4
2025-12-04T11:59:23.4165866Z FAILED [8.3175s] [100%]
2025-12-04T11:59:23.4165868Z 
2025-12-04T11:59:23.4165924Z =================================== FAILURES ===================================
2025-12-04T11:59:23.4166023Z ________ TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda _________
2025-12-04T11:59:23.4166068Z Traceback (most recent call last):
2025-12-04T11:59:23.4166233Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T11:59:23.4166277Z     self._join_processes(fn)
2025-12-04T11:59:23.4166453Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T11:59:23.4166505Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T11:59:23.4166686Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T11:59:23.4166732Z     raise RuntimeError(error)
2025-12-04T11:59:23.4166811Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T11:59:23.4166855Z Traceback (most recent call last):
2025-12-04T11:59:23.4167017Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.4167060Z     getattr(self, test_name)()
2025-12-04T11:59:23.4167220Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.4167275Z     fn()
2025-12-04T11:59:23.4167426Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4167467Z     method(*args, **kwargs)
2025-12-04T11:59:23.4167620Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4167660Z     method(*args, **kwargs)
2025-12-04T11:59:23.4167813Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.4167850Z     with policy():
2025-12-04T11:59:23.4168003Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.4168044Z     raise RuntimeError(msg)
2025-12-04T11:59:23.4168384Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 4096 on device 1. CUDA driver allocated memory was 2317352960 and is now 3523215360.
2025-12-04T11:59:23.4168388Z 
2025-12-04T11:59:23.4168463Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.4168688Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda
2025-12-04T11:59:23.4168691Z 
2025-12-04T11:59:23.4168778Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.4168780Z 
2025-12-04T11:59:23.4168782Z 
2025-12-04T11:59:23.4168857Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:59:23.4168944Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T11:59:23.4169202Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-e1a1f11b70cb3331.xml -
2025-12-04T11:59:23.4169263Z =========================== short test summary info ============================
2025-12-04T11:59:23.4169510Z FAILED [8.3175s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy1_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T11:59:23.4169554Z Traceback (most recent call last):
2025-12-04T11:59:23.4169798Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.4169839Z     getattr(self, test_name)()
2025-12-04T11:59:23.4169999Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.4170032Z     fn()
2025-12-04T11:59:23.4170186Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4170226Z     method(*args, **kwargs)
2025-12-04T11:59:23.4170380Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4170418Z     method(*args, **kwargs)
2025-12-04T11:59:23.4170570Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.4170607Z     with policy():
2025-12-04T11:59:23.4170761Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.4170803Z     raise RuntimeError(msg)
2025-12-04T11:59:23.4171142Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 4096 on device 1. CUDA driver allocated memory was 2317352960 and is now 3523215360.
2025-12-04T11:59:23.4171170Z 
2025-12-04T11:59:23.4171245Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.4171470Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda
2025-12-04T11:59:23.4171473Z 
2025-12-04T11:59:23.4171560Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.4171624Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:59:23.4171687Z ======================= 1 failed, 7 deselected in 8.33s ========================
2025-12-04T11:59:23.4171723Z Got exit code 1
2025-12-04T11:59:23.4171763Z Retrying single test...
2025-12-04T11:59:23.4171977Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-8927b7ab7fcfe23b.xml
2025-12-04T11:59:23.4172034Z ============================= test session starts ==============================
2025-12-04T11:59:23.4172148Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:59:23.4172189Z cachedir: .pytest_cache
2025-12-04T11:59:23.4172350Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:59:23.4172394Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T11:59:23.4172434Z configfile: pytest.ini
2025-12-04T11:59:23.4172600Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:59:23.4172673Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T11:59:23.4172893Z stepcurrent: skipping 7 already run items. Running only test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy1_cuda
2025-12-04T11:59:23.4172939Z Running 1 items in this shard
2025-12-04T11:59:23.4172942Z 
2025-12-04T11:59:23.4173247Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy1_cuda I1204 11:59:00.957000 214579 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 214648
2025-12-04T11:59:23.4173403Z I1204 11:59:00.958000 214579 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 214649
2025-12-04T11:59:23.4173586Z I1204 11:59:00.958000 214579 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 214650
2025-12-04T11:59:23.4173740Z I1204 11:59:00.959000 214579 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 214651
2025-12-04T11:59:23.4174256Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.4174318Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.4174817Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.4174876Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.4175370Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.4175446Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.4175938Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.4175995Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.4176140Z [rank3]:E1204 11:59:08.072000 214651 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.4176307Z [rank3]:E1204 11:59:08.072000 214651 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.4176607Z [rank3]:E1204 11:59:08.072000 214651 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.4176766Z [rank3]:E1204 11:59:08.072000 214651 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.4177058Z [rank3]:E1204 11:59:08.072000 214651 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.4177185Z [rank3]:E1204 11:59:08.072000 214651 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.4177465Z [rank3]:E1204 11:59:08.072000 214651 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4177616Z [rank3]:E1204 11:59:08.072000 214651 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4177915Z [rank3]:E1204 11:59:08.072000 214651 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4178063Z [rank3]:E1204 11:59:08.072000 214651 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4178342Z [rank3]:E1204 11:59:08.072000 214651 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.4178480Z [rank3]:E1204 11:59:08.072000 214651 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.4178762Z [rank3]:E1204 11:59:08.072000 214651 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.4178914Z [rank3]:E1204 11:59:08.072000 214651 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.4179387Z [rank3]:E1204 11:59:08.072000 214651 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 4096 on device 3. CUDA driver allocated memory was 2250244096 and is now 3456106496.
2025-12-04T11:59:23.4179503Z [rank3]:E1204 11:59:08.072000 214651 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4179764Z [rank3]:E1204 11:59:08.072000 214651 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.4180117Z [rank3]:E1204 11:59:08.072000 214651 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda
2025-12-04T11:59:23.4180234Z [rank3]:E1204 11:59:08.072000 214651 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4180447Z [rank3]:E1204 11:59:08.072000 214651 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.4180615Z [rank3]:E1204 11:59:08.072000 214651 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T11:59:23.4180654Z dist init r=3, world=4
2025-12-04T11:59:23.4180794Z [rank1]:E1204 11:59:08.076000 214649 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.4180955Z [rank1]:E1204 11:59:08.076000 214649 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.4181246Z [rank1]:E1204 11:59:08.076000 214649 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.4181403Z [rank1]:E1204 11:59:08.076000 214649 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.4181692Z [rank1]:E1204 11:59:08.076000 214649 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.4181819Z [rank1]:E1204 11:59:08.076000 214649 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.4182102Z [rank1]:E1204 11:59:08.076000 214649 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4182279Z [rank1]:E1204 11:59:08.076000 214649 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4182558Z [rank1]:E1204 11:59:08.076000 214649 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4182706Z [rank1]:E1204 11:59:08.076000 214649 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4182988Z [rank1]:E1204 11:59:08.076000 214649 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.4183125Z [rank1]:E1204 11:59:08.076000 214649 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.4183409Z [rank1]:E1204 11:59:08.076000 214649 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.4183560Z [rank1]:E1204 11:59:08.076000 214649 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.4184035Z [rank1]:E1204 11:59:08.076000 214649 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 4096 on device 1. CUDA driver allocated memory was 2317352960 and is now 3523215360.
2025-12-04T11:59:23.4184180Z [rank1]:E1204 11:59:08.076000 214649 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4184378Z [rank1]:E1204 11:59:08.076000 214649 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.4184732Z [rank1]:E1204 11:59:08.076000 214649 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda
2025-12-04T11:59:23.4184844Z [rank1]:E1204 11:59:08.076000 214649 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4185058Z [rank1]:E1204 11:59:08.076000 214649 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.4185225Z [rank1]:E1204 11:59:08.076000 214649 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T11:59:23.4185264Z dist init r=1, world=4
2025-12-04T11:59:23.4185402Z [rank0]:E1204 11:59:08.090000 214648 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.4185564Z [rank0]:E1204 11:59:08.090000 214648 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.4185853Z [rank0]:E1204 11:59:08.090000 214648 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.4186007Z [rank0]:E1204 11:59:08.090000 214648 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.4186301Z [rank0]:E1204 11:59:08.090000 214648 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.4186424Z [rank0]:E1204 11:59:08.090000 214648 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.4186722Z [rank0]:E1204 11:59:08.090000 214648 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4186871Z [rank0]:E1204 11:59:08.090000 214648 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4187149Z [rank0]:E1204 11:59:08.090000 214648 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4187298Z [rank0]:E1204 11:59:08.090000 214648 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4187578Z [rank0]:E1204 11:59:08.090000 214648 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.4187717Z [rank0]:E1204 11:59:08.090000 214648 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.4187996Z [rank0]:E1204 11:59:08.090000 214648 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.4188145Z [rank0]:E1204 11:59:08.090000 214648 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.4188645Z [rank0]:E1204 11:59:08.090000 214648 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 4096 on device 0. CUDA driver allocated memory was 2459959296 and is now 3665821696.
2025-12-04T11:59:23.4188763Z [rank0]:E1204 11:59:08.090000 214648 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4188960Z [rank0]:E1204 11:59:08.090000 214648 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.4189312Z [rank0]:E1204 11:59:08.090000 214648 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda
2025-12-04T11:59:23.4189428Z [rank0]:E1204 11:59:08.090000 214648 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4189674Z [rank0]:E1204 11:59:08.090000 214648 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.4189841Z [rank0]:E1204 11:59:08.090000 214648 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T11:59:23.4189879Z dist init r=0, world=4
2025-12-04T11:59:23.4190019Z [rank2]:E1204 11:59:08.092000 214650 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.4190181Z [rank2]:E1204 11:59:08.092000 214650 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.4190473Z [rank2]:E1204 11:59:08.092000 214650 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.4190629Z [rank2]:E1204 11:59:08.092000 214650 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.4190949Z [rank2]:E1204 11:59:08.092000 214650 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.4191074Z [rank2]:E1204 11:59:08.092000 214650 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.4191352Z [rank2]:E1204 11:59:08.092000 214650 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4191503Z [rank2]:E1204 11:59:08.092000 214650 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4191783Z [rank2]:E1204 11:59:08.092000 214650 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4191933Z [rank2]:E1204 11:59:08.092000 214650 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4192211Z [rank2]:E1204 11:59:08.092000 214650 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.4192348Z [rank2]:E1204 11:59:08.092000 214650 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.4192629Z [rank2]:E1204 11:59:08.092000 214650 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.4192803Z [rank2]:E1204 11:59:08.092000 214650 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.4193275Z [rank2]:E1204 11:59:08.092000 214650 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 4096 on device 2. CUDA driver allocated memory was 2300575744 and is now 3506438144.
2025-12-04T11:59:23.4193389Z [rank2]:E1204 11:59:08.092000 214650 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4193587Z [rank2]:E1204 11:59:08.092000 214650 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.4193942Z [rank2]:E1204 11:59:08.092000 214650 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda
2025-12-04T11:59:23.4194054Z [rank2]:E1204 11:59:08.092000 214650 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4194272Z [rank2]:E1204 11:59:08.092000 214650 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.4194437Z [rank2]:E1204 11:59:08.092000 214650 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T11:59:23.4194476Z dist init r=2, world=4
2025-12-04T11:59:23.4194513Z FAILED [8.3183s] [100%]
2025-12-04T11:59:23.4194517Z 
2025-12-04T11:59:23.4194572Z =================================== FAILURES ===================================
2025-12-04T11:59:23.4194668Z ________ TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda _________
2025-12-04T11:59:23.4194713Z Traceback (most recent call last):
2025-12-04T11:59:23.4194875Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T11:59:23.4194919Z     self._join_processes(fn)
2025-12-04T11:59:23.4195112Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T11:59:23.4195167Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T11:59:23.4195347Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T11:59:23.4195390Z     raise RuntimeError(error)
2025-12-04T11:59:23.4195469Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T11:59:23.4195516Z Traceback (most recent call last):
2025-12-04T11:59:23.4195678Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.4195721Z     getattr(self, test_name)()
2025-12-04T11:59:23.4195881Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.4195915Z     fn()
2025-12-04T11:59:23.4196069Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4196108Z     method(*args, **kwargs)
2025-12-04T11:59:23.4196262Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4196300Z     method(*args, **kwargs)
2025-12-04T11:59:23.4196452Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.4196512Z     with policy():
2025-12-04T11:59:23.4196666Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.4196705Z     raise RuntimeError(msg)
2025-12-04T11:59:23.4197047Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 4096 on device 1. CUDA driver allocated memory was 2317352960 and is now 3523215360.
2025-12-04T11:59:23.4197049Z 
2025-12-04T11:59:23.4197124Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.4197346Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda
2025-12-04T11:59:23.4197348Z 
2025-12-04T11:59:23.4197437Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.4197440Z 
2025-12-04T11:59:23.4197442Z 
2025-12-04T11:59:23.4197515Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:59:23.4197603Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T11:59:23.4197857Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-8927b7ab7fcfe23b.xml -
2025-12-04T11:59:23.4197917Z =========================== short test summary info ============================
2025-12-04T11:59:23.4198159Z FAILED [8.3183s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy1_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T11:59:23.4198205Z Traceback (most recent call last):
2025-12-04T11:59:23.4198369Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.4198412Z     getattr(self, test_name)()
2025-12-04T11:59:23.4198572Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.4198606Z     fn()
2025-12-04T11:59:23.4198759Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4198821Z     method(*args, **kwargs)
2025-12-04T11:59:23.4198977Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4199016Z     method(*args, **kwargs)
2025-12-04T11:59:23.4199167Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.4199204Z     with policy():
2025-12-04T11:59:23.4199357Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.4199399Z     raise RuntimeError(msg)
2025-12-04T11:59:23.4199785Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 4096 on device 1. CUDA driver allocated memory was 2317352960 and is now 3523215360.
2025-12-04T11:59:23.4199788Z 
2025-12-04T11:59:23.4199863Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.4200086Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda
2025-12-04T11:59:23.4200088Z 
2025-12-04T11:59:23.4200175Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.4200238Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:59:23.4200326Z ======================= 1 failed, 7 deselected in 8.33s ========================
2025-12-04T11:59:23.4200362Z Got exit code 1
2025-12-04T11:59:23.4200400Z Retrying single test...
2025-12-04T11:59:23.4200608Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-e8a73bea9050c490.xml
2025-12-04T11:59:23.4200664Z ============================= test session starts ==============================
2025-12-04T11:59:23.4200779Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:59:23.4200819Z cachedir: .pytest_cache
2025-12-04T11:59:23.4200979Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:59:23.4201023Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T11:59:23.4201064Z configfile: pytest.ini
2025-12-04T11:59:23.4201227Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:59:23.4201300Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T11:59:23.4201518Z stepcurrent: skipping 7 already run items. Running only test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy1_cuda
2025-12-04T11:59:23.4201560Z Running 1 items in this shard
2025-12-04T11:59:23.4201562Z 
2025-12-04T11:59:23.4201868Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy1_cuda I1204 11:59:11.821000 214973 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 215042
2025-12-04T11:59:23.4202025Z I1204 11:59:11.822000 214973 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 215043
2025-12-04T11:59:23.4202180Z I1204 11:59:11.822000 214973 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 215044
2025-12-04T11:59:23.4202333Z I1204 11:59:11.823000 214973 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 215045
2025-12-04T11:59:23.4202866Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.4202927Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.4203425Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.4203486Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.4203981Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.4204039Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.4204532Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T11:59:23.4204619Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T11:59:23.4204763Z [rank0]:E1204 11:59:18.963000 215042 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.4204927Z [rank0]:E1204 11:59:18.963000 215042 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.4205221Z [rank0]:E1204 11:59:18.963000 215042 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.4205375Z [rank0]:E1204 11:59:18.963000 215042 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.4205664Z [rank0]:E1204 11:59:18.963000 215042 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.4205791Z [rank0]:E1204 11:59:18.963000 215042 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.4206077Z [rank0]:E1204 11:59:18.963000 215042 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4206227Z [rank0]:E1204 11:59:18.963000 215042 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4206507Z [rank0]:E1204 11:59:18.963000 215042 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4206656Z [rank0]:E1204 11:59:18.963000 215042 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4206934Z [rank0]:E1204 11:59:18.963000 215042 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.4207091Z [rank0]:E1204 11:59:18.963000 215042 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.4207372Z [rank0]:E1204 11:59:18.963000 215042 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.4207521Z [rank0]:E1204 11:59:18.963000 215042 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.4207993Z [rank0]:E1204 11:59:18.963000 215042 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 4096 on device 0. CUDA driver allocated memory was 2459959296 and is now 3665821696.
2025-12-04T11:59:23.4208110Z [rank0]:E1204 11:59:18.963000 215042 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4208309Z [rank0]:E1204 11:59:18.963000 215042 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.4208662Z [rank0]:E1204 11:59:18.963000 215042 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda
2025-12-04T11:59:23.4208775Z [rank0]:E1204 11:59:18.963000 215042 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4209008Z [rank0]:E1204 11:59:18.963000 215042 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.4209174Z [rank0]:E1204 11:59:18.963000 215042 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T11:59:23.4209211Z dist init r=0, world=4
2025-12-04T11:59:23.4209351Z [rank2]:E1204 11:59:18.971000 215044 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.4209513Z [rank2]:E1204 11:59:18.971000 215044 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.4209842Z [rank2]:E1204 11:59:18.971000 215044 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.4209999Z [rank2]:E1204 11:59:18.971000 215044 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.4210286Z [rank2]:E1204 11:59:18.971000 215044 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.4210413Z [rank2]:E1204 11:59:18.971000 215044 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.4210693Z [rank2]:E1204 11:59:18.971000 215044 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4210842Z [rank2]:E1204 11:59:18.971000 215044 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4211123Z [rank2]:E1204 11:59:18.971000 215044 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4211270Z [rank2]:E1204 11:59:18.971000 215044 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4211576Z [rank2]:E1204 11:59:18.971000 215044 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.4211713Z [rank2]:E1204 11:59:18.971000 215044 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.4211997Z [rank2]:E1204 11:59:18.971000 215044 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.4212149Z [rank2]:E1204 11:59:18.971000 215044 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.4212624Z [rank2]:E1204 11:59:18.971000 215044 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 4096 on device 2. CUDA driver allocated memory was 2300575744 and is now 3506438144.
2025-12-04T11:59:23.4212739Z [rank2]:E1204 11:59:18.971000 215044 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4212935Z [rank2]:E1204 11:59:18.971000 215044 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.4213289Z [rank2]:E1204 11:59:18.971000 215044 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda
2025-12-04T11:59:23.4213429Z [rank2]:E1204 11:59:18.971000 215044 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4213644Z [rank2]:E1204 11:59:18.971000 215044 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.4213809Z [rank2]:E1204 11:59:18.971000 215044 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T11:59:23.4213847Z dist init r=2, world=4
2025-12-04T11:59:23.4213985Z [rank3]:E1204 11:59:18.996000 215045 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.4214147Z [rank3]:E1204 11:59:18.996000 215045 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.4214441Z [rank3]:E1204 11:59:18.996000 215045 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.4214594Z [rank3]:E1204 11:59:18.996000 215045 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.4214884Z [rank3]:E1204 11:59:18.996000 215045 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.4215006Z [rank3]:E1204 11:59:18.996000 215045 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.4215287Z [rank3]:E1204 11:59:18.996000 215045 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4215436Z [rank3]:E1204 11:59:18.996000 215045 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4215739Z [rank3]:E1204 11:59:18.996000 215045 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4215887Z [rank3]:E1204 11:59:18.996000 215045 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4216164Z [rank3]:E1204 11:59:18.996000 215045 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.4216301Z [rank3]:E1204 11:59:18.996000 215045 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.4216586Z [rank3]:E1204 11:59:18.996000 215045 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.4216735Z [rank3]:E1204 11:59:18.996000 215045 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.4217204Z [rank3]:E1204 11:59:18.996000 215045 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 4096 on device 3. CUDA driver allocated memory was 2250244096 and is now 3456106496.
2025-12-04T11:59:23.4217319Z [rank3]:E1204 11:59:18.996000 215045 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4217537Z [rank3]:E1204 11:59:18.996000 215045 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.4217890Z [rank3]:E1204 11:59:18.996000 215045 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda
2025-12-04T11:59:23.4218003Z [rank3]:E1204 11:59:18.996000 215045 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4218217Z [rank3]:E1204 11:59:18.996000 215045 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.4218383Z [rank3]:E1204 11:59:18.996000 215045 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T11:59:23.4218422Z dist init r=3, world=4
2025-12-04T11:59:23.4218560Z [rank1]:E1204 11:59:19.004000 215043 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T11:59:23.4218721Z [rank1]:E1204 11:59:19.004000 215043 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T11:59:23.4219017Z [rank1]:E1204 11:59:19.004000 215043 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.4219173Z [rank1]:E1204 11:59:19.004000 215043 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T11:59:23.4219459Z [rank1]:E1204 11:59:19.004000 215043 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.4219621Z [rank1]:E1204 11:59:19.004000 215043 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T11:59:23.4219899Z [rank1]:E1204 11:59:19.004000 215043 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4220082Z [rank1]:E1204 11:59:19.004000 215043 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4220359Z [rank1]:E1204 11:59:19.004000 215043 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4220507Z [rank1]:E1204 11:59:19.004000 215043 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T11:59:23.4220787Z [rank1]:E1204 11:59:19.004000 215043 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.4220924Z [rank1]:E1204 11:59:19.004000 215043 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T11:59:23.4221206Z [rank1]:E1204 11:59:19.004000 215043 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.4221355Z [rank1]:E1204 11:59:19.004000 215043 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T11:59:23.4221827Z [rank1]:E1204 11:59:19.004000 215043 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 4096 on device 1. CUDA driver allocated memory was 2317352960 and is now 3523215360.
2025-12-04T11:59:23.4221963Z [rank1]:E1204 11:59:19.004000 215043 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4222162Z [rank1]:E1204 11:59:19.004000 215043 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.4222514Z [rank1]:E1204 11:59:19.004000 215043 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda
2025-12-04T11:59:23.4222625Z [rank1]:E1204 11:59:19.004000 215043 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T11:59:23.4222839Z [rank1]:E1204 11:59:19.004000 215043 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.4223005Z [rank1]:E1204 11:59:19.004000 215043 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T11:59:23.4223043Z dist init r=1, world=4
2025-12-04T11:59:23.4223079Z FAILED [8.4184s] [100%]
2025-12-04T11:59:23.4223081Z 
2025-12-04T11:59:23.4223138Z =================================== FAILURES ===================================
2025-12-04T11:59:23.4223233Z ________ TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda _________
2025-12-04T11:59:23.4223278Z Traceback (most recent call last):
2025-12-04T11:59:23.4223440Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T11:59:23.4223483Z     self._join_processes(fn)
2025-12-04T11:59:23.4223657Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T11:59:23.4223713Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T11:59:23.4223893Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T11:59:23.4223936Z     raise RuntimeError(error)
2025-12-04T11:59:23.4224014Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T11:59:23.4224077Z Traceback (most recent call last):
2025-12-04T11:59:23.4224239Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.4224280Z     getattr(self, test_name)()
2025-12-04T11:59:23.4224439Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.4224472Z     fn()
2025-12-04T11:59:23.4224624Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4224664Z     method(*args, **kwargs)
2025-12-04T11:59:23.4224818Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4224856Z     method(*args, **kwargs)
2025-12-04T11:59:23.4225010Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.4225046Z     with policy():
2025-12-04T11:59:23.4225200Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.4225239Z     raise RuntimeError(msg)
2025-12-04T11:59:23.4225582Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 4096 on device 3. CUDA driver allocated memory was 2250244096 and is now 3456106496.
2025-12-04T11:59:23.4225605Z 
2025-12-04T11:59:23.4225679Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.4225903Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda
2025-12-04T11:59:23.4225905Z 
2025-12-04T11:59:23.4225994Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.4225996Z 
2025-12-04T11:59:23.4225998Z 
2025-12-04T11:59:23.4226072Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:59:23.4226159Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T11:59:23.4226409Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-e8a73bea9050c490.xml -
2025-12-04T11:59:23.4226470Z =========================== short test summary info ============================
2025-12-04T11:59:23.4226711Z FAILED [8.4184s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy1_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T11:59:23.4226755Z Traceback (most recent call last):
2025-12-04T11:59:23.4226922Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T11:59:23.4226964Z     getattr(self, test_name)()
2025-12-04T11:59:23.4227124Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T11:59:23.4227158Z     fn()
2025-12-04T11:59:23.4227310Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4227350Z     method(*args, **kwargs)
2025-12-04T11:59:23.4227503Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T11:59:23.4227542Z     method(*args, **kwargs)
2025-12-04T11:59:23.4227694Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T11:59:23.4227730Z     with policy():
2025-12-04T11:59:23.4227903Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T11:59:23.4227943Z     raise RuntimeError(msg)
2025-12-04T11:59:23.4228289Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 4096 on device 3. CUDA driver allocated memory was 2250244096 and is now 3456106496.
2025-12-04T11:59:23.4228291Z 
2025-12-04T11:59:23.4228364Z To execute this test, run the following from the base repo dir:
2025-12-04T11:59:23.4228588Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda
2025-12-04T11:59:23.4228590Z 
2025-12-04T11:59:23.4228676Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:59:23.4228739Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:59:23.4228800Z ======================= 1 failed, 7 deselected in 8.43s ========================
2025-12-04T11:59:23.4228837Z Got exit code 1
2025-12-04T11:59:23.4229010Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy1_cuda
2025-12-04T11:59:23.4229138Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T11:59:23.4229344Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-4f67c54b925ea078.xml
2025-12-04T11:59:23.4229424Z ============================= test session starts ==============================
2025-12-04T11:59:23.4229537Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:59:23.4229625Z cachedir: .pytest_cache
2025-12-04T11:59:23.4229788Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:59:23.4229832Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T11:59:23.4229872Z configfile: pytest.ini
2025-12-04T11:59:23.4230035Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:59:23.4230106Z collecting ... collected 8 items / 8 deselected / 0 selected
2025-12-04T11:59:23.4230157Z stepcurrent: skipping 8 already run items.
2025-12-04T11:59:23.4230200Z Running 0 items in this shard
2025-12-04T11:59:23.4230203Z 
2025-12-04T11:59:23.4230454Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-4f67c54b925ea078.xml -
2025-12-04T11:59:23.4230512Z ============================ 8 deselected in 0.00s =============================
2025-12-04T11:59:23.4232019Z The following tests failed consistently: ['test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy0_cuda', 'test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy1_cuda', 'test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda', 'test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda', 'test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda', 'test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda', 'test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy0_cuda', 'test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy1_cuda']
2025-12-04T11:59:23.4232048Z 
2025-12-04T11:59:23.4232252Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_exec_order 1/1 (test/test-reports/distributed.fsdp.test_fsdp_exec_order_1.1_b71b860b1a78e6ee_.log)
2025-12-04T11:59:23.4232254Z 
2025-12-04T11:59:23.4232387Z Finished distributed/fsdp/test_fsdp_exec_order 1/1 ... [2025-12-04 11:59:23.325635][5227604.304670961], took 4.32min
2025-12-04T11:59:23.4232679Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T11:59:23.4232767Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T11:59:23.4232861Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading
2025-12-04T11:59:23.4232907Z Uploading artifacts took 0.00 seconds
2025-12-04T11:59:23.4232965Z distributed/fsdp/test_fsdp_exec_order 1/1 failed!
2025-12-04T11:59:23.4233079Z Running distributed/test_distributed_spawn 2/7 ... [2025-12-04 11:59:23.327918][5227604.306959872]
2025-12-04T11:59:23.4233144Z MPI not available -- MPI backend tests will be skipped
2025-12-04T11:59:23.4233226Z Running distributed tests for the test backend with env init_method
2025-12-04T11:59:23.4233273Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T11:59:23.4233615Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=2', '--num-shards=7', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:59:23.328420]
2025-12-04T11:59:25.1902694Z 
2025-12-04T11:59:25.1904012Z distributed/test_distributed_spawn 2/7 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_2.7_b13ac23383d709c6_.log
2025-12-04T11:59:25.1904959Z Running 0 items in this shard:
2025-12-04T11:59:25.1905197Z 
2025-12-04T11:59:25.1908857Z Running distributed tests for the test backend with file init_method
2025-12-04T11:59:25.1909638Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T11:59:25.1912532Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=2', '--num-shards=7', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:59:25.191092]
2025-12-04T11:59:27.0556709Z 
2025-12-04T11:59:27.0557720Z distributed/test_distributed_spawn 2/7 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_2.7_0b90106487d58c16_.log
2025-12-04T11:59:27.0558483Z Running 0 items in this shard:
2025-12-04T11:59:27.0558677Z 
2025-12-04T11:59:27.0563113Z Running distributed tests for the nccl backend with env init_method
2025-12-04T11:59:27.0564890Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T11:59:27.0566501Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=2', '--num-shards=7', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:59:27.056470]
2025-12-04T12:02:28.2516980Z 
2025-12-04T12:02:28.2518340Z distributed/test_distributed_spawn 2/7 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_2.7_f619207b10d6dd9a_.log
2025-12-04T12:02:28.2532449Z Running 41 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_gradient, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_No_Affine, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_half, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_requires_grad, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_SyncBatchNorm_process_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_cuda_async, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_cuda_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_average_parameters, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_nccl, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_broadcast, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_control_flow_same_across_ranks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_remove_autograd_hooks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_destroy_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_rank, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_rank_size_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_irecv, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_world_size_not_divisible_by_group_size, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_periodic_model_averager, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_periodic_model_averager_param_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum_cuda_twice, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_checks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_cuda_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_object_list, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_any_source, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_any_source_torch_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_with_tag, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_with_tag_autograd_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_with_tag_torch_profiler
2025-12-04T12:02:28.2541593Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_gradient
2025-12-04T12:02:28.2542257Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_No_Affine
2025-12-04T12:02:28.2542830Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_half
2025-12-04T12:02:28.2543383Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_requires_grad
2025-12-04T12:02:28.2543905Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_SyncBatchNorm_process_group
2025-12-04T12:02:28.2544419Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_max
2025-12-04T12:02:28.2544951Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_sum
2025-12-04T12:02:28.2545436Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_max
2025-12-04T12:02:28.2545827Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_max
2025-12-04T12:02:28.2546203Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_max
2025-12-04T12:02:28.2546574Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_product
2025-12-04T12:02:28.2546981Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_max
2025-12-04T12:02:28.2547329Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_min
2025-12-04T12:02:28.2547676Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum
2025-12-04T12:02:28.2548045Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_cuda_async
2025-12-04T12:02:28.2548455Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_complex
2025-12-04T12:02:28.2548889Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_cuda_complex
2025-12-04T12:02:28.2549291Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_average_parameters
2025-12-04T12:02:28.2549685Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier
2025-12-04T12:02:28.2550037Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_nccl
2025-12-04T12:02:28.2550400Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_broadcast
2025-12-04T12:02:28.2550770Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_control_flow_same_across_ranks
2025-12-04T12:02:28.2551167Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_remove_autograd_hooks
2025-12-04T12:02:28.2551533Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_destroy_group
2025-12-04T12:02:28.2551880Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_rank
2025-12-04T12:02:28.2552238Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_rank_size_full_group
2025-12-04T12:02:28.2552590Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_irecv
2025-12-04T12:02:28.2553029Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_world_size_not_divisible_by_group_size
2025-12-04T12:02:28.2553457Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_periodic_model_averager
2025-12-04T12:02:28.2553854Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_periodic_model_averager_param_group
2025-12-04T12:02:28.2554259Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_sum
2025-12-04T12:02:28.2554634Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum_cuda_twice
2025-12-04T12:02:28.2554996Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_checks
2025-12-04T12:02:28.2555343Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_cuda_complex
2025-12-04T12:02:28.2555686Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_group
2025-12-04T12:02:28.2556046Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_object_list
2025-12-04T12:02:28.2556396Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_any_source
2025-12-04T12:02:28.2556803Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_any_source_torch_profiler
2025-12-04T12:02:28.2557170Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_with_tag
2025-12-04T12:02:28.2557542Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_with_tag_autograd_profiler
2025-12-04T12:02:28.2557934Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_with_tag_torch_profiler
2025-12-04T12:02:28.2558149Z 
2025-12-04T12:02:28.2558238Z Running distributed tests for the nccl backend with file init_method
2025-12-04T12:02:28.2558411Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T12:02:28.2558840Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=2', '--num-shards=7', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:02:28.253032]
2025-12-04T12:05:27.3885469Z 
2025-12-04T12:05:27.3889192Z distributed/test_distributed_spawn 2/7 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_2.7_3465749de903e098_.log
2025-12-04T12:05:27.3901097Z Running 41 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_gradient, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_No_Affine, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_half, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_requires_grad, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_SyncBatchNorm_process_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_cuda_async, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_cuda_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_average_parameters, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_nccl, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_broadcast, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_control_flow_same_across_ranks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_remove_autograd_hooks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_destroy_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_rank, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_rank_size_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_irecv, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_world_size_not_divisible_by_group_size, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_periodic_model_averager, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_periodic_model_averager_param_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum_cuda_twice, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_checks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_cuda_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_object_list, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_any_source, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_any_source_torch_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_with_tag, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_with_tag_autograd_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_with_tag_torch_profiler
2025-12-04T12:05:27.3909518Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_gradient
2025-12-04T12:05:27.3910158Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_No_Affine
2025-12-04T12:05:27.3910611Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_half
2025-12-04T12:05:27.3911051Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_requires_grad
2025-12-04T12:05:27.3911471Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_SyncBatchNorm_process_group
2025-12-04T12:05:27.3911932Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_max
2025-12-04T12:05:27.3912351Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_sum
2025-12-04T12:05:27.3912752Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_max
2025-12-04T12:05:27.3913140Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_max
2025-12-04T12:05:27.3913520Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_max
2025-12-04T12:05:27.3913898Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_product
2025-12-04T12:05:27.3914268Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_max
2025-12-04T12:05:27.3914620Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_min
2025-12-04T12:05:27.3914967Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum
2025-12-04T12:05:27.3915338Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_cuda_async
2025-12-04T12:05:27.3915784Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_complex
2025-12-04T12:05:27.3916219Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_cuda_complex
2025-12-04T12:05:27.3916629Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_average_parameters
2025-12-04T12:05:27.3916985Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier
2025-12-04T12:05:27.3917336Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_nccl
2025-12-04T12:05:27.3917696Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_broadcast
2025-12-04T12:05:27.3918081Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_control_flow_same_across_ranks
2025-12-04T12:05:27.3918484Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_remove_autograd_hooks
2025-12-04T12:05:27.3918855Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_destroy_group
2025-12-04T12:05:27.3919205Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_rank
2025-12-04T12:05:27.3919564Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_rank_size_full_group
2025-12-04T12:05:27.3919982Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_irecv
2025-12-04T12:05:27.3920365Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_world_size_not_divisible_by_group_size
2025-12-04T12:05:27.3920770Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_periodic_model_averager
2025-12-04T12:05:27.3921147Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_periodic_model_averager_param_group
2025-12-04T12:05:27.3921523Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_sum
2025-12-04T12:05:27.3921932Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum_cuda_twice
2025-12-04T12:05:27.3922276Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_checks
2025-12-04T12:05:27.3922619Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_cuda_complex
2025-12-04T12:05:27.3922957Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_group
2025-12-04T12:05:27.3923296Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_object_list
2025-12-04T12:05:27.3923646Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_any_source
2025-12-04T12:05:27.3924024Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_any_source_torch_profiler
2025-12-04T12:05:27.3924396Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_with_tag
2025-12-04T12:05:27.3924764Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_with_tag_autograd_profiler
2025-12-04T12:05:27.3925160Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_with_tag_torch_profiler
2025-12-04T12:05:27.3925408Z 
2025-12-04T12:05:27.3925505Z Running distributed tests for the gloo backend with env init_method
2025-12-04T12:05:27.3925675Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T12:05:27.3926110Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=2', '--num-shards=7', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:05:27.389301]
2025-12-04T12:08:51.1799157Z 
2025-12-04T12:08:51.1800371Z distributed/test_distributed_spawn 2/7 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_2.7_3795574493b85cde_.log
2025-12-04T12:08:51.1813151Z Running 41 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_gradient, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_No_Affine, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_half, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_requires_grad, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_SyncBatchNorm_process_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_cuda_async, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_cuda_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_average_parameters, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_nccl, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_broadcast, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_control_flow_same_across_ranks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_remove_autograd_hooks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_destroy_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_rank, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_rank_size_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_irecv, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_world_size_not_divisible_by_group_size, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_periodic_model_averager, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_periodic_model_averager_param_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum_cuda_twice, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_checks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_cuda_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_object_list, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_any_source, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_any_source_torch_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_with_tag, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_with_tag_autograd_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_with_tag_torch_profiler
2025-12-04T12:08:51.1822096Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_gradient
2025-12-04T12:08:51.1822726Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_No_Affine
2025-12-04T12:08:51.1823306Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_half
2025-12-04T12:08:51.1823873Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_requires_grad
2025-12-04T12:08:51.1824400Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_SyncBatchNorm_process_group
2025-12-04T12:08:51.1824813Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_max
2025-12-04T12:08:51.1825239Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_sum
2025-12-04T12:08:51.1825649Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_max
2025-12-04T12:08:51.1826046Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_max
2025-12-04T12:08:51.1826488Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_max
2025-12-04T12:08:51.1826870Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_product
2025-12-04T12:08:51.1827246Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_max
2025-12-04T12:08:51.1827601Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_min
2025-12-04T12:08:51.1827957Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum
2025-12-04T12:08:51.1828327Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_cuda_async
2025-12-04T12:08:51.1828748Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_complex
2025-12-04T12:08:51.1829191Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_cuda_complex
2025-12-04T12:08:51.1829654Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_average_parameters
2025-12-04T12:08:51.1830008Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier
2025-12-04T12:08:51.1830414Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_nccl
2025-12-04T12:08:51.1830780Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_broadcast
2025-12-04T12:08:51.1831159Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_control_flow_same_across_ranks
2025-12-04T12:08:51.1831569Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_remove_autograd_hooks
2025-12-04T12:08:51.1831944Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_destroy_group
2025-12-04T12:08:51.1832293Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_rank
2025-12-04T12:08:51.1832658Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_rank_size_full_group
2025-12-04T12:08:51.1833021Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_irecv
2025-12-04T12:08:51.1833425Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_world_size_not_divisible_by_group_size
2025-12-04T12:08:51.1833859Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_periodic_model_averager
2025-12-04T12:08:51.1834267Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_periodic_model_averager_param_group
2025-12-04T12:08:51.1834651Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_sum
2025-12-04T12:08:51.1835002Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum_cuda_twice
2025-12-04T12:08:51.1835345Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_checks
2025-12-04T12:08:51.1835683Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_cuda_complex
2025-12-04T12:08:51.1836022Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_group
2025-12-04T12:08:51.1836396Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_object_list
2025-12-04T12:08:51.1836741Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_any_source
2025-12-04T12:08:51.1837110Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_any_source_torch_profiler
2025-12-04T12:08:51.1837476Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_with_tag
2025-12-04T12:08:51.1837845Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_with_tag_autograd_profiler
2025-12-04T12:08:51.1838232Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_with_tag_torch_profiler
2025-12-04T12:08:51.1838445Z 
2025-12-04T12:08:51.1838536Z Running distributed tests for the gloo backend with file init_method
2025-12-04T12:08:51.1838708Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T12:08:51.1839138Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=2', '--num-shards=7', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:08:51.180753]
2025-12-04T12:12:14.4865053Z 
2025-12-04T12:12:14.4866222Z distributed/test_distributed_spawn 2/7 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_2.7_e5fa07b133880a43_.log
2025-12-04T12:12:14.4872977Z Running 41 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_gradient, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_No_Affine, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_half, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_requires_grad, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_SyncBatchNorm_process_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_cuda_async, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_cuda_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_average_parameters, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_nccl, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_broadcast, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_control_flow_same_across_ranks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_remove_autograd_hooks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_destroy_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_rank, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_rank_size_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_irecv, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_world_size_not_divisible_by_group_size, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_periodic_model_averager, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_periodic_model_averager_param_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum_cuda_twice, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_checks, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_cuda_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_object_list, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_any_source, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_any_source_torch_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_with_tag, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_with_tag_autograd_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_with_tag_torch_profiler
﻿2025-12-04T12:12:14.4881505Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_gradient
2025-12-04T12:12:14.4881967Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_No_Affine
2025-12-04T12:12:14.4882391Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_half
2025-12-04T12:12:14.4882813Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_requires_grad
2025-12-04T12:12:14.4883210Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_SyncBatchNorm_process_group
2025-12-04T12:12:14.4883597Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_max
2025-12-04T12:12:14.4884008Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_sum
2025-12-04T12:12:14.4884386Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_max
2025-12-04T12:12:14.4884752Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_max
2025-12-04T12:12:14.4885115Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_max
2025-12-04T12:12:14.4885470Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_product
2025-12-04T12:12:14.4885818Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_max
2025-12-04T12:12:14.4886150Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_min
2025-12-04T12:12:14.4886560Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum
2025-12-04T12:12:14.4886908Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_cuda_async
2025-12-04T12:12:14.4887295Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_complex
2025-12-04T12:12:14.4887717Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_cuda_complex
2025-12-04T12:12:14.4888105Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_average_parameters
2025-12-04T12:12:14.4888439Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier
2025-12-04T12:12:14.4888776Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_nccl
2025-12-04T12:12:14.4889118Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_broadcast
2025-12-04T12:12:14.4889472Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_control_flow_same_across_ranks
2025-12-04T12:12:14.4889879Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_remove_autograd_hooks
2025-12-04T12:12:14.4890253Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_destroy_group
2025-12-04T12:12:14.4890577Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_rank
2025-12-04T12:12:14.4890913Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_rank_size_full_group
2025-12-04T12:12:14.4891321Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_irecv
2025-12-04T12:12:14.4891699Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_world_size_not_divisible_by_group_size
2025-12-04T12:12:14.4892096Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_periodic_model_averager
2025-12-04T12:12:14.4892472Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_periodic_model_averager_param_group
2025-12-04T12:12:14.4892847Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_sum
2025-12-04T12:12:14.4893204Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum_cuda_twice
2025-12-04T12:12:14.4893548Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_checks
2025-12-04T12:12:14.4893885Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_cuda_complex
2025-12-04T12:12:14.4894222Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_group
2025-12-04T12:12:14.4894560Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_object_list
2025-12-04T12:12:14.4894905Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_any_source
2025-12-04T12:12:14.4895274Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_any_source_torch_profiler
2025-12-04T12:12:14.4895640Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_with_tag
2025-12-04T12:12:14.4896054Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_with_tag_autograd_profiler
2025-12-04T12:12:14.4896444Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_with_tag_torch_profiler
2025-12-04T12:12:14.4896655Z 
2025-12-04T12:12:14.4896789Z Finished distributed/test_distributed_spawn 2/7 ... [2025-12-04 12:12:14.487035][5228375.466071689], took 12.85min
2025-12-04T12:12:14.4897240Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T12:12:14.4897637Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T12:12:14.4897874Z Running distributed/test_distributed_spawn 5/7 ... [2025-12-04 12:12:14.489446][5228375.468488448]
2025-12-04T12:12:14.4898099Z MPI not available -- MPI backend tests will be skipped
2025-12-04T12:12:14.4898285Z Running distributed tests for the test backend with env init_method
2025-12-04T12:12:14.4898459Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T12:12:14.4899369Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=5', '--num-shards=7', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:12:14.489840]
2025-12-04T12:12:16.3505707Z 
2025-12-04T12:12:16.3506652Z distributed/test_distributed_spawn 5/7 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_5.7_ac3e1d0a6320b1a9_.log
2025-12-04T12:12:16.3507558Z Running 0 items in this shard:
2025-12-04T12:12:16.3507772Z 
2025-12-04T12:12:16.3512943Z Running distributed tests for the test backend with file init_method
2025-12-04T12:12:16.3515309Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T12:12:16.3517272Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=5', '--num-shards=7', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:12:16.351544]
2025-12-04T12:12:18.2100014Z 
2025-12-04T12:12:18.2100906Z distributed/test_distributed_spawn 5/7 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_5.7_77501df2042f0bce_.log
2025-12-04T12:12:18.2101651Z Running 0 items in this shard:
2025-12-04T12:12:18.2101835Z 
2025-12-04T12:12:18.2106122Z Running distributed tests for the nccl backend with env init_method
2025-12-04T12:12:18.2108515Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T12:12:18.2110325Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=5', '--num-shards=7', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:12:18.210860]
2025-12-04T12:16:19.3652141Z 
2025-12-04T12:16:19.3653199Z distributed/test_distributed_spawn 5/7 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_5.7_dc8d67f6dc9d3408_.log
2025-12-04T12:16:19.3666451Z Running 49 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_Running_Value, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Single_Input_Per_Process, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedSampler_padding, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_object_default_pg, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_complex_unsupported_ops, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_timeout_global, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_gloo_tags, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_mixed_backend_err, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_self_nccl, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_apply_optim_in_backward_ignored_params, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_broadcast_buffer, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_parity_allreduce_process_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_ignore_params_arg, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_logging_data_cpu, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_profiling_autograd_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_profiling_execution_trace, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_python_error_logged, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_returns_tensor_with_no_grad, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_shared_grad_acc_unused_params, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_input_join_disable, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_dump_DDP_relevant_env_vars, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_object, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_gloo, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_group_size_exceeds_world_size, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_output_unused_in_loss_tuple_module, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_post_localSGD_optimizer_parity_with_hierarchical_sgd_grad_is_view, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum_twice, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_any_source_autograd_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl_torch_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_sparse_all_reduce_sum_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_stateless_api_with_ddp, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_verify_model_across_rank_without_logger
2025-12-04T12:16:19.3676064Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_Running_Value
2025-12-04T12:16:19.3676645Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Single_Input_Per_Process
2025-12-04T12:16:19.3677151Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedSampler_padding
2025-12-04T12:16:19.3677570Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather
2025-12-04T12:16:19.3677965Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_cuda
2025-12-04T12:16:19.3678392Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_object_default_pg
2025-12-04T12:16:19.3678843Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_product
2025-12-04T12:16:19.3679282Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_min
2025-12-04T12:16:19.3679707Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_product
2025-12-04T12:16:19.3680128Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_sum
2025-12-04T12:16:19.3680509Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_complex_unsupported_ops
2025-12-04T12:16:19.3680887Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_complex
2025-12-04T12:16:19.3681242Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_cuda
2025-12-04T12:16:19.3681590Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_group
2025-12-04T12:16:19.3681963Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_complex
2025-12-04T12:16:19.3682375Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_cuda
2025-12-04T12:16:19.3682755Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_timeout_global
2025-12-04T12:16:19.3683119Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_gloo_tags
2025-12-04T12:16:19.3683501Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_mixed_backend_err
2025-12-04T12:16:19.3683887Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_self_nccl
2025-12-04T12:16:19.3684286Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_apply_optim_in_backward_ignored_params
2025-12-04T12:16:19.3684719Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_broadcast_buffer
2025-12-04T12:16:19.3685105Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_parity_allreduce_process_group
2025-12-04T12:16:19.3685489Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_ignore_params_arg
2025-12-04T12:16:19.3685842Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_logging_data_cpu
2025-12-04T12:16:19.3686218Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_profiling_autograd_profiler
2025-12-04T12:16:19.3686602Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_profiling_execution_trace
2025-12-04T12:16:19.3686972Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_python_error_logged
2025-12-04T12:16:19.3687350Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_returns_tensor_with_no_grad
2025-12-04T12:16:19.3687742Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_shared_grad_acc_unused_params
2025-12-04T12:16:19.3688129Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_input_join_disable
2025-12-04T12:16:19.3688518Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_dump_DDP_relevant_env_vars
2025-12-04T12:16:19.3688876Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_full_group
2025-12-04T12:16:19.3689218Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_object
2025-12-04T12:16:19.3689626Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_gloo
2025-12-04T12:16:19.3690016Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_group_size_exceeds_world_size
2025-12-04T12:16:19.3690424Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_output_unused_in_loss_tuple_module
2025-12-04T12:16:19.3690861Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_post_localSGD_optimizer_parity_with_hierarchical_sgd_grad_is_view
2025-12-04T12:16:19.3691279Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_max
2025-12-04T12:16:19.3691620Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_max
2025-12-04T12:16:19.3691957Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_product
2025-12-04T12:16:19.3692295Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum_twice
2025-12-04T12:16:19.3692638Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_full_group
2025-12-04T12:16:19.3692976Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv
2025-12-04T12:16:19.3693340Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_any_source_autograd_profiler
2025-12-04T12:16:19.3693730Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl_torch_profiler
2025-12-04T12:16:19.3694109Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_sparse_all_reduce_sum_cuda
2025-12-04T12:16:19.3694508Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_stateless_api_with_ddp
2025-12-04T12:16:19.3694890Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_verify_model_across_rank_without_logger
2025-12-04T12:16:19.3695111Z 
2025-12-04T12:16:19.3695203Z Running distributed tests for the nccl backend with file init_method
2025-12-04T12:16:19.3695375Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T12:16:19.3695806Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=5', '--num-shards=7', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:16:19.366119]
2025-12-04T12:20:22.2374264Z 
2025-12-04T12:20:22.2375426Z distributed/test_distributed_spawn 5/7 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_5.7_826092b9f65b8986_.log
2025-12-04T12:20:22.2389967Z Running 49 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_Running_Value, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Single_Input_Per_Process, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedSampler_padding, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_object_default_pg, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_complex_unsupported_ops, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_timeout_global, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_gloo_tags, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_mixed_backend_err, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_self_nccl, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_apply_optim_in_backward_ignored_params, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_broadcast_buffer, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_parity_allreduce_process_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_ignore_params_arg, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_logging_data_cpu, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_profiling_autograd_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_profiling_execution_trace, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_python_error_logged, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_returns_tensor_with_no_grad, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_shared_grad_acc_unused_params, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_input_join_disable, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_dump_DDP_relevant_env_vars, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_object, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_gloo, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_group_size_exceeds_world_size, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_output_unused_in_loss_tuple_module, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_post_localSGD_optimizer_parity_with_hierarchical_sgd_grad_is_view, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum_twice, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_any_source_autograd_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl_torch_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_sparse_all_reduce_sum_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_stateless_api_with_ddp, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_verify_model_across_rank_without_logger
2025-12-04T12:20:22.2399918Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_Running_Value
2025-12-04T12:20:22.2400443Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Single_Input_Per_Process
2025-12-04T12:20:22.2400907Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedSampler_padding
2025-12-04T12:20:22.2401292Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather
2025-12-04T12:20:22.2401652Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_cuda
2025-12-04T12:20:22.2402030Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_object_default_pg
2025-12-04T12:20:22.2402448Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_product
2025-12-04T12:20:22.2402872Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_min
2025-12-04T12:20:22.2403288Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_product
2025-12-04T12:20:22.2403693Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_sum
2025-12-04T12:20:22.2404095Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_complex_unsupported_ops
2025-12-04T12:20:22.2404548Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_complex
2025-12-04T12:20:22.2404934Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_cuda
2025-12-04T12:20:22.2405289Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_group
2025-12-04T12:20:22.2405680Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_complex
2025-12-04T12:20:22.2406098Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_cuda
2025-12-04T12:20:22.2406495Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_timeout_global
2025-12-04T12:20:22.2406881Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_gloo_tags
2025-12-04T12:20:22.2407279Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_mixed_backend_err
2025-12-04T12:20:22.2407680Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_self_nccl
2025-12-04T12:20:22.2408090Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_apply_optim_in_backward_ignored_params
2025-12-04T12:20:22.2408520Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_broadcast_buffer
2025-12-04T12:20:22.2408916Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_parity_allreduce_process_group
2025-12-04T12:20:22.2409322Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_ignore_params_arg
2025-12-04T12:20:22.2409723Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_logging_data_cpu
2025-12-04T12:20:22.2410094Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_profiling_autograd_profiler
2025-12-04T12:20:22.2410480Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_profiling_execution_trace
2025-12-04T12:20:22.2410879Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_python_error_logged
2025-12-04T12:20:22.2411252Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_returns_tensor_with_no_grad
2025-12-04T12:20:22.2411643Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_shared_grad_acc_unused_params
2025-12-04T12:20:22.2412028Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_input_join_disable
2025-12-04T12:20:22.2412402Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_dump_DDP_relevant_env_vars
2025-12-04T12:20:22.2412761Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_full_group
2025-12-04T12:20:22.2413101Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_object
2025-12-04T12:20:22.2413444Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_gloo
2025-12-04T12:20:22.2413830Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_group_size_exceeds_world_size
2025-12-04T12:20:22.2414273Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_output_unused_in_loss_tuple_module
2025-12-04T12:20:22.2414709Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_post_localSGD_optimizer_parity_with_hierarchical_sgd_grad_is_view
2025-12-04T12:20:22.2415125Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_max
2025-12-04T12:20:22.2415463Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_max
2025-12-04T12:20:22.2415789Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_product
2025-12-04T12:20:22.2416123Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum_twice
2025-12-04T12:20:22.2416469Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_full_group
2025-12-04T12:20:22.2416806Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv
2025-12-04T12:20:22.2417166Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_any_source_autograd_profiler
2025-12-04T12:20:22.2417558Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl_torch_profiler
2025-12-04T12:20:22.2418031Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_sparse_all_reduce_sum_cuda
2025-12-04T12:20:22.2418392Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_stateless_api_with_ddp
2025-12-04T12:20:22.2418795Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_verify_model_across_rank_without_logger
2025-12-04T12:20:22.2419023Z 
2025-12-04T12:20:22.2419109Z Running distributed tests for the gloo backend with env init_method
2025-12-04T12:20:22.2419279Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T12:20:22.2419746Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=5', '--num-shards=7', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:20:22.238291]
2025-12-04T12:24:27.6468820Z 
2025-12-04T12:24:27.6470240Z distributed/test_distributed_spawn 5/7 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_5.7_13c969534f9043ac_.log
2025-12-04T12:24:27.6486846Z Running 49 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_Running_Value, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Single_Input_Per_Process, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedSampler_padding, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_object_default_pg, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_complex_unsupported_ops, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_timeout_global, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_gloo_tags, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_mixed_backend_err, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_self_nccl, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_apply_optim_in_backward_ignored_params, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_broadcast_buffer, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_parity_allreduce_process_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_ignore_params_arg, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_logging_data_cpu, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_profiling_autograd_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_profiling_execution_trace, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_python_error_logged, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_returns_tensor_with_no_grad, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_shared_grad_acc_unused_params, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_input_join_disable, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_dump_DDP_relevant_env_vars, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_object, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_gloo, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_group_size_exceeds_world_size, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_output_unused_in_loss_tuple_module, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_post_localSGD_optimizer_parity_with_hierarchical_sgd_grad_is_view, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum_twice, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_any_source_autograd_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl_torch_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_sparse_all_reduce_sum_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_stateless_api_with_ddp, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_verify_model_across_rank_without_logger
2025-12-04T12:24:27.6496710Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_Running_Value
2025-12-04T12:24:27.6497281Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Single_Input_Per_Process
2025-12-04T12:24:27.6497778Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedSampler_padding
2025-12-04T12:24:27.6498182Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather
2025-12-04T12:24:27.6498562Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_cuda
2025-12-04T12:24:27.6498974Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_object_default_pg
2025-12-04T12:24:27.6499431Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_product
2025-12-04T12:24:27.6499921Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_min
2025-12-04T12:24:27.6500320Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_product
2025-12-04T12:24:27.6500694Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_sum
2025-12-04T12:24:27.6501133Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_complex_unsupported_ops
2025-12-04T12:24:27.6501512Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_complex
2025-12-04T12:24:27.6501893Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_cuda
2025-12-04T12:24:27.6502234Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_group
2025-12-04T12:24:27.6502602Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_complex
2025-12-04T12:24:27.6503003Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_cuda
2025-12-04T12:24:27.6503380Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_timeout_global
2025-12-04T12:24:27.6503748Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_gloo_tags
2025-12-04T12:24:27.6504131Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_mixed_backend_err
2025-12-04T12:24:27.6504515Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_self_nccl
2025-12-04T12:24:27.6504907Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_apply_optim_in_backward_ignored_params
2025-12-04T12:24:27.6505297Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_broadcast_buffer
2025-12-04T12:24:27.6505677Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_parity_allreduce_process_group
2025-12-04T12:24:27.6506055Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_ignore_params_arg
2025-12-04T12:24:27.6506410Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_logging_data_cpu
2025-12-04T12:24:27.6506816Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_profiling_autograd_profiler
2025-12-04T12:24:27.6507200Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_profiling_execution_trace
2025-12-04T12:24:27.6507567Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_python_error_logged
2025-12-04T12:24:27.6507940Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_returns_tensor_with_no_grad
2025-12-04T12:24:27.6508324Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_shared_grad_acc_unused_params
2025-12-04T12:24:27.6508704Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_input_join_disable
2025-12-04T12:24:27.6509081Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_dump_DDP_relevant_env_vars
2025-12-04T12:24:27.6509438Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_full_group
2025-12-04T12:24:27.6509814Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_object
2025-12-04T12:24:27.6510158Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_gloo
2025-12-04T12:24:27.6510565Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_group_size_exceeds_world_size
2025-12-04T12:24:27.6510973Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_output_unused_in_loss_tuple_module
2025-12-04T12:24:27.6511427Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_post_localSGD_optimizer_parity_with_hierarchical_sgd_grad_is_view
2025-12-04T12:24:27.6511842Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_max
2025-12-04T12:24:27.6512178Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_max
2025-12-04T12:24:27.6512508Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_product
2025-12-04T12:24:27.6512849Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum_twice
2025-12-04T12:24:27.6513190Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_full_group
2025-12-04T12:24:27.6513523Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv
2025-12-04T12:24:27.6513888Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_any_source_autograd_profiler
2025-12-04T12:24:27.6514278Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl_torch_profiler
2025-12-04T12:24:27.6514651Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_sparse_all_reduce_sum_cuda
2025-12-04T12:24:27.6515016Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_stateless_api_with_ddp
2025-12-04T12:24:27.6515399Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_verify_model_across_rank_without_logger
2025-12-04T12:24:27.6515619Z 
2025-12-04T12:24:27.6515707Z Running distributed tests for the gloo backend with file init_method
2025-12-04T12:24:27.6515972Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T12:24:27.6516496Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=5', '--num-shards=7', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:24:27.648439]
2025-12-04T12:28:33.3971719Z 
2025-12-04T12:28:33.3972828Z distributed/test_distributed_spawn 5/7 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_5.7_3f8f9b2ee17a8cb7_.log
2025-12-04T12:28:33.3981221Z Running 49 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_Running_Value, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Single_Input_Per_Process, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedSampler_padding, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_object_default_pg, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_complex_unsupported_ops, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_timeout_global, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_gloo_tags, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_mixed_backend_err, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_self_nccl, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_apply_optim_in_backward_ignored_params, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_broadcast_buffer, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_parity_allreduce_process_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_ignore_params_arg, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_logging_data_cpu, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_profiling_autograd_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_profiling_execution_trace, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_python_error_logged, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_returns_tensor_with_no_grad, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_shared_grad_acc_unused_params, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_input_join_disable, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_dump_DDP_relevant_env_vars, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_object, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_gloo, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_group_size_exceeds_world_size, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_output_unused_in_loss_tuple_module, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_post_localSGD_optimizer_parity_with_hierarchical_sgd_grad_is_view, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum_twice, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_any_source_autograd_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl_torch_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_sparse_all_reduce_sum_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_stateless_api_with_ddp, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_verify_model_across_rank_without_logger
2025-12-04T12:28:33.3988986Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_Running_Value
2025-12-04T12:28:33.3989521Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Single_Input_Per_Process
2025-12-04T12:28:33.3990055Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedSampler_padding
2025-12-04T12:28:33.3990458Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather
2025-12-04T12:28:33.3990826Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_cuda
2025-12-04T12:28:33.3991217Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_object_default_pg
2025-12-04T12:28:33.3991679Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_product
2025-12-04T12:28:33.3992308Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_min
2025-12-04T12:28:33.3992742Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_product
2025-12-04T12:28:33.3993145Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_sum
2025-12-04T12:28:33.3993548Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_complex_unsupported_ops
2025-12-04T12:28:33.3993963Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_complex
2025-12-04T12:28:33.3994345Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_cuda
2025-12-04T12:28:33.3994730Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_group
2025-12-04T12:28:33.3995181Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_complex
2025-12-04T12:28:33.3995609Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_cuda
2025-12-04T12:28:33.3996026Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_timeout_global
2025-12-04T12:28:33.3996818Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_gloo_tags
2025-12-04T12:28:33.3997224Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_mixed_backend_err
2025-12-04T12:28:33.3997650Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_self_nccl
2025-12-04T12:28:33.3998073Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_apply_optim_in_backward_ignored_params
2025-12-04T12:28:33.3998495Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_broadcast_buffer
2025-12-04T12:28:33.3998911Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_parity_allreduce_process_group
2025-12-04T12:28:33.3999319Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_ignore_params_arg
2025-12-04T12:28:33.3999836Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_logging_data_cpu
2025-12-04T12:28:33.4000231Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_profiling_autograd_profiler
2025-12-04T12:28:33.4000646Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_profiling_execution_trace
2025-12-04T12:28:33.4001081Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_python_error_logged
2025-12-04T12:28:33.4001476Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_returns_tensor_with_no_grad
2025-12-04T12:28:33.4001906Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_shared_grad_acc_unused_params
2025-12-04T12:28:33.4002321Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_input_join_disable
2025-12-04T12:28:33.4002720Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_dump_DDP_relevant_env_vars
2025-12-04T12:28:33.4003119Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_full_group
2025-12-04T12:28:33.4003487Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_object
2025-12-04T12:28:33.4003873Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_gloo
2025-12-04T12:28:33.4004287Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_group_size_exceeds_world_size
2025-12-04T12:28:33.4004732Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_output_unused_in_loss_tuple_module
2025-12-04T12:28:33.4005966Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_post_localSGD_optimizer_parity_with_hierarchical_sgd_grad_is_view
2025-12-04T12:28:33.4006406Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_max
2025-12-04T12:28:33.4006813Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_max
2025-12-04T12:28:33.4007188Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_product
2025-12-04T12:28:33.4007555Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum_twice
2025-12-04T12:28:33.4007927Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_full_group
2025-12-04T12:28:33.4008297Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv
2025-12-04T12:28:33.4008691Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_any_source_autograd_profiler
2025-12-04T12:28:33.4009129Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl_torch_profiler
2025-12-04T12:28:33.4009546Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_sparse_all_reduce_sum_cuda
2025-12-04T12:28:33.4009981Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_stateless_api_with_ddp
2025-12-04T12:28:33.4010398Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_verify_model_across_rank_without_logger
2025-12-04T12:28:33.4010655Z 
2025-12-04T12:28:33.4010808Z Finished distributed/test_distributed_spawn 5/7 ... [2025-12-04 12:28:33.397987][5229354.377023565], took 16.32min
2025-12-04T12:28:33.4011299Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T12:28:33.4011738Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T12:28:33.4012022Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading
2025-12-04T12:28:33.4021622Z Uploading artifacts took 0.00 seconds
2025-12-04T12:28:33.4021836Z Running distributed/fsdp/test_fsdp_input 1/1 ... [2025-12-04 12:28:33.400657][5229354.3796988]
2025-12-04T12:28:33.4022074Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T12:28:33.4022491Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/fsdp/test_fsdp_input.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:28:33.400843]
2025-12-04T12:29:30.3417249Z 
2025-12-04T12:29:30.3418187Z PRINTING LOG FILE of distributed/fsdp/test_fsdp_input 1/1 (test/test-reports/distributed.fsdp.test_fsdp_input_1.1_bc379566c9ef67b0_.log)
2025-12-04T12:29:30.3419556Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_input/distributed.fsdp.test_fsdp_input-e478acc36cfea895.xml
2025-12-04T12:29:30.3420582Z ============================= test session starts ==============================
2025-12-04T12:29:30.3421224Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:29:30.3421752Z cachedir: .pytest_cache
2025-12-04T12:29:30.3423630Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:29:30.3424311Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T12:29:30.3424642Z configfile: pytest.ini
2025-12-04T12:29:30.3425326Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T12:29:30.3425978Z collecting ... collected 2 items
2025-12-04T12:29:30.3426243Z stepcurrent: Cannot find last run test, not skipping
2025-12-04T12:29:30.3427623Z Running 2 items in this shard: test/distributed/fsdp/test_fsdp_input.py::TestInputCUDA::test_input_type_dict_cuda, test/distributed/fsdp/test_fsdp_input.py::TestInputCUDA::test_input_type_list_cuda
2025-12-04T12:29:30.3428227Z 
2025-12-04T12:29:30.3428803Z distributed/fsdp/test_fsdp_input.py::TestInputCUDA::test_input_type_dict_cuda I1204 12:28:35.242000 317063 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 317132
2025-12-04T12:29:30.3430032Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T12:29:30.3430798Z   _init_core_state(
2025-12-04T12:29:30.3433382Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:29:30.3436188Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:29:30.3436618Z [rank0]:E1204 12:28:40.445000 317132 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:29:30.3437238Z [rank0]:E1204 12:28:40.445000 317132 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:29:30.3437941Z [rank0]:E1204 12:28:40.445000 317132 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:29:30.3438615Z [rank0]:E1204 12:28:40.445000 317132 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:29:30.3439292Z [rank0]:E1204 12:28:40.445000 317132 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:29:30.3439989Z [rank0]:E1204 12:28:40.445000 317132 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:29:30.3440615Z [rank0]:E1204 12:28:40.445000 317132 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:30.3441275Z [rank0]:E1204 12:28:40.445000 317132 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:29:30.3441931Z [rank0]:E1204 12:28:40.445000 317132 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:30.3442578Z [rank0]:E1204 12:28:40.445000 317132 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:29:30.3443230Z [rank0]:E1204 12:28:40.445000 317132 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:29:30.3443940Z [rank0]:E1204 12:28:40.445000 317132 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:29:30.3444585Z [rank0]:E1204 12:28:40.445000 317132 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:29:30.3445234Z [rank0]:E1204 12:28:40.445000 317132 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:29:30.3446108Z [rank0]:E1204 12:28:40.445000 317132 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestInputCUDA.test_input_type_dict_cuda! Caching allocator allocated memory was 512 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1633681408 and is now 2130706432.
2025-12-04T12:29:30.3446975Z [rank0]:E1204 12:28:40.445000 317132 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:29:30.3447367Z [rank0]:E1204 12:28:40.445000 317132 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:29:30.3447968Z [rank0]:E1204 12:28:40.445000 317132 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_input.py TestInputCUDA.test_input_type_dict_cuda
2025-12-04T12:29:30.3448500Z [rank0]:E1204 12:28:40.445000 317132 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:29:30.3449016Z [rank0]:E1204 12:28:40.445000 317132 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:29:30.3449474Z [rank0]:E1204 12:28:40.445000 317132 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T12:29:30.3449815Z dist init r=0, world=1
2025-12-04T12:29:30.3450273Z [rank0]:[W1204 12:28:40.607400974 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T12:29:30.3450719Z FAILED [6.7122s] [ 50%]
2025-12-04T12:29:30.3450796Z 
2025-12-04T12:29:30.3450859Z =================================== FAILURES ===================================
2025-12-04T12:29:30.3451061Z ___________________ TestInputCUDA.test_input_type_dict_cuda ____________________
2025-12-04T12:29:30.3451249Z Traceback (most recent call last):
2025-12-04T12:29:30.3451524Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T12:29:30.3451794Z     self._join_processes(fn)
2025-12-04T12:29:30.3452075Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T12:29:30.3452372Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T12:29:30.3452666Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T12:29:30.3452959Z     raise RuntimeError(error)
2025-12-04T12:29:30.3453125Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:29:30.3453304Z Traceback (most recent call last):
2025-12-04T12:29:30.3453570Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:29:30.3453838Z     getattr(self, test_name)()
2025-12-04T12:29:30.3454100Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:29:30.3454365Z     fn()
2025-12-04T12:29:30.3454641Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:30.3454900Z     method(*args, **kwargs)
2025-12-04T12:29:30.3455148Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:30.3455402Z     method(*args, **kwargs)
2025-12-04T12:29:30.3455650Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:29:30.3455902Z     with policy():
2025-12-04T12:29:30.3456134Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:29:30.3456390Z     raise RuntimeError(msg)
2025-12-04T12:29:30.3456791Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestInputCUDA.test_input_type_dict_cuda! Caching allocator allocated memory was 512 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1633681408 and is now 2130706432.
2025-12-04T12:29:30.3457136Z 
2025-12-04T12:29:30.3457210Z To execute this test, run the following from the base repo dir:
2025-12-04T12:29:30.3457512Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_input.py TestInputCUDA.test_input_type_dict_cuda
2025-12-04T12:29:30.3457735Z 
2025-12-04T12:29:30.3457825Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:29:30.3457952Z 
2025-12-04T12:29:30.3457953Z 
2025-12-04T12:29:30.3458052Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:29:30.3458253Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T12:29:30.3458623Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_input/distributed.fsdp.test_fsdp_input-e478acc36cfea895.xml -
2025-12-04T12:29:30.3458977Z =========================== short test summary info ============================
2025-12-04T12:29:30.3459283Z FAILED [6.7122s] distributed/fsdp/test_fsdp_input.py::TestInputCUDA::test_input_type_dict_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:29:30.3459616Z Traceback (most recent call last):
2025-12-04T12:29:30.3459864Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:29:30.3460114Z     getattr(self, test_name)()
2025-12-04T12:29:30.3460347Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:29:30.3460586Z     fn()
2025-12-04T12:29:30.3460792Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:30.3461021Z     method(*args, **kwargs)
2025-12-04T12:29:30.3461241Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:30.3461470Z     method(*args, **kwargs)
2025-12-04T12:29:30.3461686Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:29:30.3461911Z     with policy():
2025-12-04T12:29:30.3462122Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:29:30.3462352Z     raise RuntimeError(msg)
2025-12-04T12:29:30.3462729Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestInputCUDA.test_input_type_dict_cuda! Caching allocator allocated memory was 512 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1633681408 and is now 2130706432.
2025-12-04T12:29:30.3463065Z 
2025-12-04T12:29:30.3463142Z To execute this test, run the following from the base repo dir:
2025-12-04T12:29:30.3463473Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_input.py TestInputCUDA.test_input_type_dict_cuda
2025-12-04T12:29:30.3463692Z 
2025-12-04T12:29:30.3463778Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:29:30.3463965Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:29:30.3464121Z ============================== 1 failed in 6.72s ===============================
2025-12-04T12:29:30.3464253Z Got exit code 1
2025-12-04T12:29:30.3464349Z Retrying single test...
2025-12-04T12:29:30.3464609Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_input/distributed.fsdp.test_fsdp_input-22b8fd4eb01f818e.xml
2025-12-04T12:29:30.3464891Z ============================= test session starts ==============================
2025-12-04T12:29:30.3465099Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:29:30.3465289Z cachedir: .pytest_cache
2025-12-04T12:29:30.3465514Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:29:30.3465752Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T12:29:30.3465868Z configfile: pytest.ini
2025-12-04T12:29:30.3466096Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T12:29:30.3466364Z collecting ... collected 2 items / 1 deselected / 1 selected
2025-12-04T12:29:30.3466648Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_fsdp_input.py::TestInputCUDA::test_input_type_dict_cuda
2025-12-04T12:29:30.3466938Z Running 1 items in this shard
2025-12-04T12:29:30.3467012Z 
2025-12-04T12:29:30.3467276Z distributed/fsdp/test_fsdp_input.py::TestInputCUDA::test_input_type_dict_cuda I1204 12:28:44.460000 317215 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 317284
2025-12-04T12:29:30.3467893Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T12:29:30.3468260Z   _init_core_state(
2025-12-04T12:29:30.3469669Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:29:30.3471088Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:29:30.3471394Z [rank0]:E1204 12:28:49.624000 317284 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:29:30.3471741Z [rank0]:E1204 12:28:49.624000 317284 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:29:30.3472240Z [rank0]:E1204 12:28:49.624000 317284 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:29:30.3472752Z [rank0]:E1204 12:28:49.624000 317284 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:29:30.3473230Z [rank0]:E1204 12:28:49.624000 317284 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:29:30.3473673Z [rank0]:E1204 12:28:49.624000 317284 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:29:30.3474111Z [rank0]:E1204 12:28:49.624000 317284 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:30.3474571Z [rank0]:E1204 12:28:49.624000 317284 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:29:30.3475033Z [rank0]:E1204 12:28:49.624000 317284 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:30.3475498Z [rank0]:E1204 12:28:49.624000 317284 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:29:30.3475964Z [rank0]:E1204 12:28:49.624000 317284 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:29:30.3476429Z [rank0]:E1204 12:28:49.624000 317284 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:29:30.3476886Z [rank0]:E1204 12:28:49.624000 317284 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:29:30.3477366Z [rank0]:E1204 12:28:49.624000 317284 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:29:30.3477982Z [rank0]:E1204 12:28:49.624000 317284 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestInputCUDA.test_input_type_dict_cuda! Caching allocator allocated memory was 512 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1633681408 and is now 2130706432.
2025-12-04T12:29:30.3478558Z [rank0]:E1204 12:28:49.624000 317284 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:29:30.3478907Z [rank0]:E1204 12:28:49.624000 317284 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:29:30.3479455Z [rank0]:E1204 12:28:49.624000 317284 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_input.py TestInputCUDA.test_input_type_dict_cuda
2025-12-04T12:29:30.3479955Z [rank0]:E1204 12:28:49.624000 317284 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:29:30.3480321Z [rank0]:E1204 12:28:49.624000 317284 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:29:30.3480732Z [rank0]:E1204 12:28:49.624000 317284 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T12:29:30.3480973Z dist init r=0, world=1
2025-12-04T12:29:30.3481372Z [rank0]:[W1204 12:28:49.781038386 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T12:29:30.3481783Z FAILED [6.7126s] [100%]
2025-12-04T12:29:30.3481848Z 
2025-12-04T12:29:30.3481936Z =================================== FAILURES ===================================
2025-12-04T12:29:30.3482134Z ___________________ TestInputCUDA.test_input_type_dict_cuda ____________________
2025-12-04T12:29:30.3482307Z Traceback (most recent call last):
2025-12-04T12:29:30.3482559Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T12:29:30.3482811Z     self._join_processes(fn)
2025-12-04T12:29:30.3483066Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T12:29:30.3483337Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T12:29:30.3483612Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T12:29:30.3483880Z     raise RuntimeError(error)
2025-12-04T12:29:30.3484039Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:29:30.3484206Z Traceback (most recent call last):
2025-12-04T12:29:30.3484453Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:29:30.3484702Z     getattr(self, test_name)()
2025-12-04T12:29:30.3484941Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:29:30.3485199Z     fn()
2025-12-04T12:29:30.3485408Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:30.3485643Z     method(*args, **kwargs)
2025-12-04T12:29:30.3485870Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:30.3486124Z     method(*args, **kwargs)
2025-12-04T12:29:30.3486354Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:29:30.3486585Z     with policy():
2025-12-04T12:29:30.3486802Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:29:30.3487039Z     raise RuntimeError(msg)
2025-12-04T12:29:30.3487416Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestInputCUDA.test_input_type_dict_cuda! Caching allocator allocated memory was 512 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1633681408 and is now 2130706432.
2025-12-04T12:29:30.3487761Z 
2025-12-04T12:29:30.3487842Z To execute this test, run the following from the base repo dir:
2025-12-04T12:29:30.3488143Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_input.py TestInputCUDA.test_input_type_dict_cuda
2025-12-04T12:29:30.3488366Z 
2025-12-04T12:29:30.3488467Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:29:30.3488593Z 
2025-12-04T12:29:30.3488594Z 
2025-12-04T12:29:30.3488679Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:29:30.3488884Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T12:29:30.3489251Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_input/distributed.fsdp.test_fsdp_input-22b8fd4eb01f818e.xml -
2025-12-04T12:29:30.3489623Z =========================== short test summary info ============================
2025-12-04T12:29:30.3489927Z FAILED [6.7126s] distributed/fsdp/test_fsdp_input.py::TestInputCUDA::test_input_type_dict_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:29:30.3490213Z Traceback (most recent call last):
2025-12-04T12:29:30.3490461Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:29:30.3490743Z     getattr(self, test_name)()
2025-12-04T12:29:30.3490982Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:29:30.3491217Z     fn()
2025-12-04T12:29:30.3491424Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:30.3491659Z     method(*args, **kwargs)
2025-12-04T12:29:30.3491886Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:30.3492120Z     method(*args, **kwargs)
2025-12-04T12:29:30.3492343Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:29:30.3492575Z     with policy():
2025-12-04T12:29:30.3492795Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:29:30.3493034Z     raise RuntimeError(msg)
2025-12-04T12:29:30.3493410Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestInputCUDA.test_input_type_dict_cuda! Caching allocator allocated memory was 512 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1633681408 and is now 2130706432.
2025-12-04T12:29:30.3493748Z 
2025-12-04T12:29:30.3493827Z To execute this test, run the following from the base repo dir:
2025-12-04T12:29:30.3494126Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_input.py TestInputCUDA.test_input_type_dict_cuda
2025-12-04T12:29:30.3494369Z 
2025-12-04T12:29:30.3494460Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:29:30.3494654Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:29:30.3494841Z ======================= 1 failed, 1 deselected in 6.72s ========================
2025-12-04T12:29:30.3494987Z Got exit code 1
2025-12-04T12:29:30.3495090Z Retrying single test...
2025-12-04T12:29:30.3495356Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_input/distributed.fsdp.test_fsdp_input-c34eff7f3645e485.xml
2025-12-04T12:29:30.3495646Z ============================= test session starts ==============================
2025-12-04T12:29:30.3495862Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:29:30.3496060Z cachedir: .pytest_cache
2025-12-04T12:29:30.3496291Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:29:30.3496537Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T12:29:30.3496663Z configfile: pytest.ini
2025-12-04T12:29:30.3496893Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T12:29:30.3497171Z collecting ... collected 2 items / 1 deselected / 1 selected
2025-12-04T12:29:30.3497479Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_fsdp_input.py::TestInputCUDA::test_input_type_dict_cuda
2025-12-04T12:29:30.3497740Z Running 1 items in this shard
2025-12-04T12:29:30.3497818Z 
2025-12-04T12:29:30.3498087Z distributed/fsdp/test_fsdp_input.py::TestInputCUDA::test_input_type_dict_cuda I1204 12:28:53.545000 317367 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 317436
2025-12-04T12:29:30.3498697Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T12:29:30.3499075Z   _init_core_state(
2025-12-04T12:29:30.3500513Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:29:30.3501947Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:29:30.3502259Z [rank0]:E1204 12:28:58.730000 317436 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:29:30.3502602Z [rank0]:E1204 12:28:58.730000 317436 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:29:30.3503097Z [rank0]:E1204 12:28:58.730000 317436 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:29:30.3503601Z [rank0]:E1204 12:28:58.730000 317436 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:29:30.3504086Z [rank0]:E1204 12:28:58.730000 317436 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:29:30.3504580Z [rank0]:E1204 12:28:58.730000 317436 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:29:30.3505025Z [rank0]:E1204 12:28:58.730000 317436 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:30.3505492Z [rank0]:E1204 12:28:58.730000 317436 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:29:30.3505962Z [rank0]:E1204 12:28:58.730000 317436 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:30.3506434Z [rank0]:E1204 12:28:58.730000 317436 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:29:30.3506905Z [rank0]:E1204 12:28:58.730000 317436 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:29:30.3507358Z [rank0]:E1204 12:28:58.730000 317436 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:29:30.3507816Z [rank0]:E1204 12:28:58.730000 317436 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:29:30.3508286Z [rank0]:E1204 12:28:58.730000 317436 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:29:30.3508935Z [rank0]:E1204 12:28:58.730000 317436 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestInputCUDA.test_input_type_dict_cuda! Caching allocator allocated memory was 512 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1633681408 and is now 2130706432.
2025-12-04T12:29:30.3509518Z [rank0]:E1204 12:28:58.730000 317436 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:29:30.3509950Z [rank0]:E1204 12:28:58.730000 317436 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:29:30.3510497Z [rank0]:E1204 12:28:58.730000 317436 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_input.py TestInputCUDA.test_input_type_dict_cuda
2025-12-04T12:29:30.3510963Z [rank0]:E1204 12:28:58.730000 317436 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:29:30.3511334Z [rank0]:E1204 12:28:58.730000 317436 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:29:30.3511756Z [rank0]:E1204 12:28:58.730000 317436 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T12:29:30.3512005Z dist init r=0, world=1
2025-12-04T12:29:30.3512408Z [rank0]:[W1204 12:28:58.892768481 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T12:29:30.3512842Z FAILED [6.7122s] [100%]
2025-12-04T12:29:30.3512906Z 
2025-12-04T12:29:30.3512967Z =================================== FAILURES ===================================
2025-12-04T12:29:30.3513149Z ___________________ TestInputCUDA.test_input_type_dict_cuda ____________________
2025-12-04T12:29:30.3513318Z Traceback (most recent call last):
2025-12-04T12:29:30.3513587Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T12:29:30.3513836Z     self._join_processes(fn)
2025-12-04T12:29:30.3514089Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T12:29:30.3514361Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T12:29:30.3514635Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T12:29:30.3514905Z     raise RuntimeError(error)
2025-12-04T12:29:30.3515063Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:29:30.3515227Z Traceback (most recent call last):
2025-12-04T12:29:30.3515474Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:29:30.3515721Z     getattr(self, test_name)()
2025-12-04T12:29:30.3515963Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:29:30.3516201Z     fn()
2025-12-04T12:29:30.3516409Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:30.3516646Z     method(*args, **kwargs)
2025-12-04T12:29:30.3516875Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:30.3517112Z     method(*args, **kwargs)
2025-12-04T12:29:30.3517336Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:29:30.3517573Z     with policy():
2025-12-04T12:29:30.3517789Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:29:30.3518032Z     raise RuntimeError(msg)
2025-12-04T12:29:30.3518442Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestInputCUDA.test_input_type_dict_cuda! Caching allocator allocated memory was 512 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1633681408 and is now 2130706432.
2025-12-04T12:29:30.3518782Z 
2025-12-04T12:29:30.3518862Z To execute this test, run the following from the base repo dir:
2025-12-04T12:29:30.3519164Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_input.py TestInputCUDA.test_input_type_dict_cuda
2025-12-04T12:29:30.3519386Z 
2025-12-04T12:29:30.3519480Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:29:30.3519661Z 
2025-12-04T12:29:30.3519663Z 
2025-12-04T12:29:30.3519743Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:29:30.3519947Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T12:29:30.3520313Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_input/distributed.fsdp.test_fsdp_input-c34eff7f3645e485.xml -
2025-12-04T12:29:30.3520649Z =========================== short test summary info ============================
2025-12-04T12:29:30.3520951Z FAILED [6.7122s] distributed/fsdp/test_fsdp_input.py::TestInputCUDA::test_input_type_dict_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:29:30.3521234Z Traceback (most recent call last):
2025-12-04T12:29:30.3521488Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:29:30.3521760Z     getattr(self, test_name)()
2025-12-04T12:29:30.3521999Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:29:30.3522239Z     fn()
2025-12-04T12:29:30.3522447Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:30.3522705Z     method(*args, **kwargs)
2025-12-04T12:29:30.3522934Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:30.3523172Z     method(*args, **kwargs)
2025-12-04T12:29:30.3523399Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:29:30.3523634Z     with policy():
2025-12-04T12:29:30.3523853Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:29:30.3524091Z     raise RuntimeError(msg)
2025-12-04T12:29:30.3524467Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestInputCUDA.test_input_type_dict_cuda! Caching allocator allocated memory was 512 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1633681408 and is now 2130706432.
2025-12-04T12:29:30.3524812Z 
2025-12-04T12:29:30.3524889Z To execute this test, run the following from the base repo dir:
2025-12-04T12:29:30.3525191Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_input.py TestInputCUDA.test_input_type_dict_cuda
2025-12-04T12:29:30.3525414Z 
2025-12-04T12:29:30.3525504Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:29:30.3525697Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:29:30.3525869Z ======================= 1 failed, 1 deselected in 6.72s ========================
2025-12-04T12:29:30.3526011Z Got exit code 1
2025-12-04T12:29:30.3526209Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_input.py::TestInputCUDA::test_input_type_dict_cuda
2025-12-04T12:29:30.3526514Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:29:30.3526913Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_input/distributed.fsdp.test_fsdp_input-6433fd4e4679d804.xml
2025-12-04T12:29:30.3527204Z ============================= test session starts ==============================
2025-12-04T12:29:30.3527420Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:29:30.3527616Z cachedir: .pytest_cache
2025-12-04T12:29:30.3527846Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:29:30.3528094Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T12:29:30.3528219Z configfile: pytest.ini
2025-12-04T12:29:30.3528456Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T12:29:30.3528732Z collecting ... collected 2 items / 1 deselected / 1 selected
2025-12-04T12:29:30.3528898Z stepcurrent: skipping 1 already run items.
2025-12-04T12:29:30.3529034Z Running 1 items in this shard
2025-12-04T12:29:30.3529112Z 
2025-12-04T12:29:30.3529385Z distributed/fsdp/test_fsdp_input.py::TestInputCUDA::test_input_type_list_cuda I1204 12:29:02.597000 317519 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 317588
2025-12-04T12:29:30.3530023Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T12:29:30.3530419Z   _init_core_state(
2025-12-04T12:29:30.3531758Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:29:30.3533188Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:29:30.3533497Z [rank0]:E1204 12:29:07.825000 317588 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:29:30.3533842Z [rank0]:E1204 12:29:07.825000 317588 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:29:30.3534336Z [rank0]:E1204 12:29:07.825000 317588 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:29:30.3534819Z [rank0]:E1204 12:29:07.825000 317588 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:29:30.3535302Z [rank0]:E1204 12:29:07.825000 317588 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:29:30.3535758Z [rank0]:E1204 12:29:07.825000 317588 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:29:30.3536238Z [rank0]:E1204 12:29:07.825000 317588 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:30.3536706Z [rank0]:E1204 12:29:07.825000 317588 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:29:30.3537168Z [rank0]:E1204 12:29:07.825000 317588 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:30.3537628Z [rank0]:E1204 12:29:07.825000 317588 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:29:30.3538093Z [rank0]:E1204 12:29:07.825000 317588 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:29:30.3538540Z [rank0]:E1204 12:29:07.825000 317588 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:29:30.3538998Z [rank0]:E1204 12:29:07.825000 317588 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:29:30.3539460Z [rank0]:E1204 12:29:07.825000 317588 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:29:30.3540116Z [rank0]:E1204 12:29:07.825000 317588 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestInputCUDA.test_input_type_list_cuda! Caching allocator allocated memory was 512 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1633681408 and is now 2130706432.
2025-12-04T12:29:30.3540710Z [rank0]:E1204 12:29:07.825000 317588 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:29:30.3541075Z [rank0]:E1204 12:29:07.825000 317588 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:29:30.3541618Z [rank0]:E1204 12:29:07.825000 317588 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_input.py TestInputCUDA.test_input_type_list_cuda
2025-12-04T12:29:30.3542078Z [rank0]:E1204 12:29:07.825000 317588 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:29:30.3542446Z [rank0]:E1204 12:29:07.825000 317588 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:29:30.3542859Z [rank0]:E1204 12:29:07.825000 317588 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T12:29:30.3543100Z dist init r=0, world=1
2025-12-04T12:29:30.3543500Z [rank0]:[W1204 12:29:07.976226866 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T12:29:30.3543908Z FAILED [6.8121s] [100%]
2025-12-04T12:29:30.3543970Z 
2025-12-04T12:29:30.3544027Z =================================== FAILURES ===================================
2025-12-04T12:29:30.3544203Z ___________________ TestInputCUDA.test_input_type_list_cuda ____________________
2025-12-04T12:29:30.3544367Z Traceback (most recent call last):
2025-12-04T12:29:30.3544609Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T12:29:30.3544854Z     self._join_processes(fn)
2025-12-04T12:29:30.3545098Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T12:29:30.3545392Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T12:29:30.3545662Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T12:29:30.3545921Z     raise RuntimeError(error)
2025-12-04T12:29:30.3546069Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:29:30.3546227Z Traceback (most recent call last):
2025-12-04T12:29:30.3546466Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:29:30.3546707Z     getattr(self, test_name)()
2025-12-04T12:29:30.3546939Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:29:30.3547170Z     fn()
2025-12-04T12:29:30.3547370Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:30.3547599Z     method(*args, **kwargs)
2025-12-04T12:29:30.3547820Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:30.3548049Z     method(*args, **kwargs)
2025-12-04T12:29:30.3548266Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:29:30.3548490Z     with policy():
2025-12-04T12:29:30.3548700Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:29:30.3548944Z     raise RuntimeError(msg)
2025-12-04T12:29:30.3549313Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestInputCUDA.test_input_type_list_cuda! Caching allocator allocated memory was 512 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1633681408 and is now 2130706432.
2025-12-04T12:29:30.3549702Z 
2025-12-04T12:29:30.3549780Z To execute this test, run the following from the base repo dir:
2025-12-04T12:29:30.3550074Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_input.py TestInputCUDA.test_input_type_list_cuda
2025-12-04T12:29:30.3550293Z 
2025-12-04T12:29:30.3550382Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:29:30.3550506Z 
2025-12-04T12:29:30.3550511Z 
2025-12-04T12:29:30.3550587Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:29:30.3550786Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T12:29:30.3551147Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_input/distributed.fsdp.test_fsdp_input-6433fd4e4679d804.xml -
2025-12-04T12:29:30.3551476Z =========================== short test summary info ============================
2025-12-04T12:29:30.3551778Z FAILED [6.8121s] distributed/fsdp/test_fsdp_input.py::TestInputCUDA::test_input_type_list_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:29:30.3552054Z Traceback (most recent call last):
2025-12-04T12:29:30.3552299Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:29:30.3552543Z     getattr(self, test_name)()
2025-12-04T12:29:30.3552779Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:29:30.3553009Z     fn()
2025-12-04T12:29:30.3553212Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:30.3553439Z     method(*args, **kwargs)
2025-12-04T12:29:30.3553657Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:30.3553886Z     method(*args, **kwargs)
2025-12-04T12:29:30.3554143Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:29:30.3554368Z     with policy():
2025-12-04T12:29:30.3554581Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:29:30.3554812Z     raise RuntimeError(msg)
2025-12-04T12:29:30.3555181Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestInputCUDA.test_input_type_list_cuda! Caching allocator allocated memory was 512 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1633681408 and is now 2130706432.
2025-12-04T12:29:30.3555519Z 
2025-12-04T12:29:30.3555593Z To execute this test, run the following from the base repo dir:
2025-12-04T12:29:30.3555889Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_input.py TestInputCUDA.test_input_type_list_cuda
2025-12-04T12:29:30.3556110Z 
2025-12-04T12:29:30.3556199Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:29:30.3556385Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:29:30.3556547Z ======================= 1 failed, 1 deselected in 6.82s ========================
2025-12-04T12:29:30.3556682Z Got exit code 1
2025-12-04T12:29:30.3556777Z Retrying single test...
2025-12-04T12:29:30.3557033Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_input/distributed.fsdp.test_fsdp_input-6a4db284eb052079.xml
2025-12-04T12:29:30.3557333Z ============================= test session starts ==============================
2025-12-04T12:29:30.3557541Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:29:30.3557729Z cachedir: .pytest_cache
2025-12-04T12:29:30.3557954Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:29:30.3558209Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T12:29:30.3558326Z configfile: pytest.ini
2025-12-04T12:29:30.3558550Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T12:29:30.3558817Z collecting ... collected 2 items / 1 deselected / 1 selected
2025-12-04T12:29:30.3559102Z stepcurrent: skipping 1 already run items. Running only test/distributed/fsdp/test_fsdp_input.py::TestInputCUDA::test_input_type_list_cuda
2025-12-04T12:29:30.3559356Z Running 1 items in this shard
2025-12-04T12:29:30.3559429Z 
2025-12-04T12:29:30.3559737Z distributed/fsdp/test_fsdp_input.py::TestInputCUDA::test_input_type_list_cuda I1204 12:29:11.736000 317671 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 317740
2025-12-04T12:29:30.3560336Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T12:29:30.3560709Z   _init_core_state(
2025-12-04T12:29:30.3562073Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:29:30.3563483Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:29:30.3563788Z [rank0]:E1204 12:29:17.014000 317740 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:29:30.3564126Z [rank0]:E1204 12:29:17.014000 317740 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:29:30.3564615Z [rank0]:E1204 12:29:17.014000 317740 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:29:30.3565098Z [rank0]:E1204 12:29:17.014000 317740 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:29:30.3565578Z [rank0]:E1204 12:29:17.014000 317740 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:29:30.3566024Z [rank0]:E1204 12:29:17.014000 317740 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:29:30.3566463Z [rank0]:E1204 12:29:17.014000 317740 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:30.3566942Z [rank0]:E1204 12:29:17.014000 317740 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:29:30.3567405Z [rank0]:E1204 12:29:17.014000 317740 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:30.3567887Z [rank0]:E1204 12:29:17.014000 317740 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:29:30.3568349Z [rank0]:E1204 12:29:17.014000 317740 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:29:30.3568796Z [rank0]:E1204 12:29:17.014000 317740 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:29:30.3569262Z [rank0]:E1204 12:29:17.014000 317740 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:29:30.3569770Z [rank0]:E1204 12:29:17.014000 317740 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:29:30.3570399Z [rank0]:E1204 12:29:17.014000 317740 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestInputCUDA.test_input_type_list_cuda! Caching allocator allocated memory was 512 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1633681408 and is now 2130706432.
2025-12-04T12:29:30.3570976Z [rank0]:E1204 12:29:17.014000 317740 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:29:30.3571326Z [rank0]:E1204 12:29:17.014000 317740 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:29:30.3571869Z [rank0]:E1204 12:29:17.014000 317740 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_input.py TestInputCUDA.test_input_type_list_cuda
2025-12-04T12:29:30.3572366Z [rank0]:E1204 12:29:17.014000 317740 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:29:30.3572732Z [rank0]:E1204 12:29:17.014000 317740 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:29:30.3573142Z [rank0]:E1204 12:29:17.014000 317740 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T12:29:30.3573382Z dist init r=0, world=1
2025-12-04T12:29:30.3573782Z [rank0]:[W1204 12:29:17.177114576 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T12:29:30.3574188Z FAILED [6.8120s] [100%]
2025-12-04T12:29:30.3574251Z 
2025-12-04T12:29:30.3574307Z =================================== FAILURES ===================================
2025-12-04T12:29:30.3574487Z ___________________ TestInputCUDA.test_input_type_list_cuda ____________________
2025-12-04T12:29:30.3574649Z Traceback (most recent call last):
2025-12-04T12:29:30.3574892Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T12:29:30.3575133Z     self._join_processes(fn)
2025-12-04T12:29:30.3575375Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T12:29:30.3575655Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T12:29:30.3575923Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T12:29:30.3576184Z     raise RuntimeError(error)
2025-12-04T12:29:30.3576333Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:29:30.3576505Z Traceback (most recent call last):
2025-12-04T12:29:30.3576745Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:29:30.3576984Z     getattr(self, test_name)()
2025-12-04T12:29:30.3577216Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:29:30.3577447Z     fn()
2025-12-04T12:29:30.3577648Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:30.3577880Z     method(*args, **kwargs)
2025-12-04T12:29:30.3578100Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:30.3578327Z     method(*args, **kwargs)
2025-12-04T12:29:30.3578546Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:29:30.3578773Z     with policy():
2025-12-04T12:29:30.3578985Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:29:30.3579215Z     raise RuntimeError(msg)
2025-12-04T12:29:30.3579622Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestInputCUDA.test_input_type_list_cuda! Caching allocator allocated memory was 512 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1633681408 and is now 2130706432.
2025-12-04T12:29:30.3579960Z 
2025-12-04T12:29:30.3580035Z To execute this test, run the following from the base repo dir:
2025-12-04T12:29:30.3580330Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_input.py TestInputCUDA.test_input_type_list_cuda
2025-12-04T12:29:30.3580550Z 
2025-12-04T12:29:30.3580637Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:29:30.3580763Z 
2025-12-04T12:29:30.3580765Z 
2025-12-04T12:29:30.3580879Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:29:30.3581075Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T12:29:30.3581441Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_input/distributed.fsdp.test_fsdp_input-6a4db284eb052079.xml -
2025-12-04T12:29:30.3581771Z =========================== short test summary info ============================
2025-12-04T12:29:30.3582076Z FAILED [6.8120s] distributed/fsdp/test_fsdp_input.py::TestInputCUDA::test_input_type_list_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:29:30.3582353Z Traceback (most recent call last):
2025-12-04T12:29:30.3582597Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:29:30.3582840Z     getattr(self, test_name)()
2025-12-04T12:29:30.3583077Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:29:30.3583308Z     fn()
2025-12-04T12:29:30.3583510Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:30.3583738Z     method(*args, **kwargs)
2025-12-04T12:29:30.3583958Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:30.3584205Z     method(*args, **kwargs)
2025-12-04T12:29:30.3584422Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:29:30.3584648Z     with policy():
2025-12-04T12:29:30.3584858Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:29:30.3585113Z     raise RuntimeError(msg)
2025-12-04T12:29:30.3585485Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestInputCUDA.test_input_type_list_cuda! Caching allocator allocated memory was 512 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1633681408 and is now 2130706432.
2025-12-04T12:29:30.3585823Z 
2025-12-04T12:29:30.3585896Z To execute this test, run the following from the base repo dir:
2025-12-04T12:29:30.3586192Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_input.py TestInputCUDA.test_input_type_list_cuda
2025-12-04T12:29:30.3586412Z 
2025-12-04T12:29:30.3586499Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:29:30.3586685Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:29:30.3586847Z ======================= 1 failed, 1 deselected in 6.82s ========================
2025-12-04T12:29:30.3586983Z Got exit code 1
2025-12-04T12:29:30.3587077Z Retrying single test...
2025-12-04T12:29:30.3587338Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_input/distributed.fsdp.test_fsdp_input-c38a48976e3e60de.xml
2025-12-04T12:29:30.3587621Z ============================= test session starts ==============================
2025-12-04T12:29:30.3587831Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:29:30.3588018Z cachedir: .pytest_cache
2025-12-04T12:29:30.3588241Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:29:30.3588482Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T12:29:30.3588601Z configfile: pytest.ini
2025-12-04T12:29:30.3588830Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T12:29:30.3589099Z collecting ... collected 2 items / 1 deselected / 1 selected
2025-12-04T12:29:30.3589412Z stepcurrent: skipping 1 already run items. Running only test/distributed/fsdp/test_fsdp_input.py::TestInputCUDA::test_input_type_list_cuda
2025-12-04T12:29:30.3589696Z Running 1 items in this shard
2025-12-04T12:29:30.3589769Z 
2025-12-04T12:29:30.3590032Z distributed/fsdp/test_fsdp_input.py::TestInputCUDA::test_input_type_list_cuda I1204 12:29:20.848000 317823 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 317892
2025-12-04T12:29:30.3590625Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T12:29:30.3590996Z   _init_core_state(
2025-12-04T12:29:30.3592340Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:29:30.3593763Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:29:30.3594083Z [rank0]:E1204 12:29:26.078000 317892 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:29:30.3594425Z [rank0]:E1204 12:29:26.078000 317892 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:29:30.3594912Z [rank0]:E1204 12:29:26.078000 317892 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:29:30.3595393Z [rank0]:E1204 12:29:26.078000 317892 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:29:30.3595873Z [rank0]:E1204 12:29:26.078000 317892 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:29:30.3596317Z [rank0]:E1204 12:29:26.078000 317892 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:29:30.3596761Z [rank0]:E1204 12:29:26.078000 317892 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:30.3597222Z [rank0]:E1204 12:29:26.078000 317892 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:29:30.3597684Z [rank0]:E1204 12:29:26.078000 317892 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:30.3598144Z [rank0]:E1204 12:29:26.078000 317892 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:29:30.3598635Z [rank0]:E1204 12:29:26.078000 317892 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:29:30.3599086Z [rank0]:E1204 12:29:26.078000 317892 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:29:30.3599538Z [rank0]:E1204 12:29:26.078000 317892 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:29:30.3600043Z [rank0]:E1204 12:29:26.078000 317892 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:29:30.3600669Z [rank0]:E1204 12:29:26.078000 317892 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestInputCUDA.test_input_type_list_cuda! Caching allocator allocated memory was 512 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1633681408 and is now 2130706432.
2025-12-04T12:29:30.3601256Z [rank0]:E1204 12:29:26.078000 317892 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:29:30.3601611Z [rank0]:E1204 12:29:26.078000 317892 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:29:30.3602152Z [rank0]:E1204 12:29:26.078000 317892 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_input.py TestInputCUDA.test_input_type_list_cuda
2025-12-04T12:29:30.3602637Z [rank0]:E1204 12:29:26.078000 317892 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:29:30.3603001Z [rank0]:E1204 12:29:26.078000 317892 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:29:30.3603430Z [rank0]:E1204 12:29:26.078000 317892 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T12:29:30.3603671Z dist init r=0, world=1
2025-12-04T12:29:30.3604073Z [rank0]:[W1204 12:29:26.242068674 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T12:29:30.3604487Z FAILED [6.7124s] [100%]
2025-12-04T12:29:30.3604549Z 
2025-12-04T12:29:30.3604608Z =================================== FAILURES ===================================
2025-12-04T12:29:30.3604783Z ___________________ TestInputCUDA.test_input_type_list_cuda ____________________
2025-12-04T12:29:30.3604946Z Traceback (most recent call last):
2025-12-04T12:29:30.3605191Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T12:29:30.3605433Z     self._join_processes(fn)
2025-12-04T12:29:30.3605679Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T12:29:30.3605943Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T12:29:30.3606213Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T12:29:30.3606471Z     raise RuntimeError(error)
2025-12-04T12:29:30.3606621Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:29:30.3606783Z Traceback (most recent call last):
2025-12-04T12:29:30.3607023Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:29:30.3607263Z     getattr(self, test_name)()
2025-12-04T12:29:30.3607494Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:29:30.3607727Z     fn()
2025-12-04T12:29:30.3607962Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:30.3608193Z     method(*args, **kwargs)
2025-12-04T12:29:30.3608416Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:30.3608645Z     method(*args, **kwargs)
2025-12-04T12:29:30.3608861Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:29:30.3609088Z     with policy():
2025-12-04T12:29:30.3609300Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:29:30.3609530Z     raise RuntimeError(msg)
2025-12-04T12:29:30.3609946Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestInputCUDA.test_input_type_list_cuda! Caching allocator allocated memory was 512 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1633681408 and is now 2130706432.
2025-12-04T12:29:30.3610287Z 
2025-12-04T12:29:30.3610360Z To execute this test, run the following from the base repo dir:
2025-12-04T12:29:30.3610653Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_input.py TestInputCUDA.test_input_type_list_cuda
2025-12-04T12:29:30.3610874Z 
2025-12-04T12:29:30.3610962Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:29:30.3611105Z 
2025-12-04T12:29:30.3611107Z 
2025-12-04T12:29:30.3611182Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:29:30.3611379Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T12:29:30.3611744Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_input/distributed.fsdp.test_fsdp_input-c38a48976e3e60de.xml -
2025-12-04T12:29:30.3612093Z =========================== short test summary info ============================
2025-12-04T12:29:30.3612392Z FAILED [6.7124s] distributed/fsdp/test_fsdp_input.py::TestInputCUDA::test_input_type_list_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:29:30.3612669Z Traceback (most recent call last):
2025-12-04T12:29:30.3612913Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:29:30.3613156Z     getattr(self, test_name)()
2025-12-04T12:29:30.3613389Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:29:30.3613619Z     fn()
2025-12-04T12:29:30.3613820Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:30.3614049Z     method(*args, **kwargs)
2025-12-04T12:29:30.3614269Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:30.3614496Z     method(*args, **kwargs)
2025-12-04T12:29:30.3614713Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:29:30.3614937Z     with policy():
2025-12-04T12:29:30.3615150Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:29:30.3615385Z     raise RuntimeError(msg)
2025-12-04T12:29:30.3615757Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestInputCUDA.test_input_type_list_cuda! Caching allocator allocated memory was 512 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1633681408 and is now 2130706432.
2025-12-04T12:29:30.3616092Z 
2025-12-04T12:29:30.3616166Z To execute this test, run the following from the base repo dir:
2025-12-04T12:29:30.3616500Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_input.py TestInputCUDA.test_input_type_list_cuda
2025-12-04T12:29:30.3616718Z 
2025-12-04T12:29:30.3616807Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:29:30.3616992Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:29:30.3617153Z ======================= 1 failed, 1 deselected in 6.72s ========================
2025-12-04T12:29:30.3617290Z Got exit code 1
2025-12-04T12:29:30.3617479Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_input.py::TestInputCUDA::test_input_type_list_cuda
2025-12-04T12:29:30.3617776Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:29:30.3618130Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_input/distributed.fsdp.test_fsdp_input-3088ee34454b69a4.xml
2025-12-04T12:29:30.3618416Z ============================= test session starts ==============================
2025-12-04T12:29:30.3618623Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:29:30.3618808Z cachedir: .pytest_cache
2025-12-04T12:29:30.3619028Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:29:30.3619264Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T12:29:30.3619380Z configfile: pytest.ini
2025-12-04T12:29:30.3619668Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T12:29:30.3619935Z collecting ... collected 2 items / 2 deselected / 0 selected
2025-12-04T12:29:30.3620092Z stepcurrent: skipping 2 already run items.
2025-12-04T12:29:30.3620219Z Running 0 items in this shard
2025-12-04T12:29:30.3620309Z 
2025-12-04T12:29:30.3620552Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_input/distributed.fsdp.test_fsdp_input-3088ee34454b69a4.xml -
2025-12-04T12:29:30.3620879Z ============================ 2 deselected in 0.00s =============================
2025-12-04T12:29:30.3621251Z The following tests failed consistently: ['test/distributed/fsdp/test_fsdp_input.py::TestInputCUDA::test_input_type_dict_cuda', 'test/distributed/fsdp/test_fsdp_input.py::TestInputCUDA::test_input_type_list_cuda']
2025-12-04T12:29:30.3621559Z 
2025-12-04T12:29:30.3621749Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_input 1/1 (test/test-reports/distributed.fsdp.test_fsdp_input_1.1_bc379566c9ef67b0_.log)
2025-12-04T12:29:30.3621970Z 
2025-12-04T12:29:30.3622096Z Finished distributed/fsdp/test_fsdp_input 1/1 ... [2025-12-04 12:29:30.341923][5229411.320957789], took 0.95min
2025-12-04T12:29:30.3622521Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T12:29:30.3622921Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T12:29:30.3623135Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading
2025-12-04T12:29:30.3623314Z Uploading artifacts took 0.00 seconds
2025-12-04T12:29:30.3623447Z distributed/fsdp/test_fsdp_input 1/1 failed!
2025-12-04T12:29:30.3623652Z Running distributed/fsdp/test_fsdp_traversal 1/1 ... [2025-12-04 12:29:30.345159][5229411.324200936]
2025-12-04T12:29:30.3623854Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T12:29:30.3624259Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/fsdp/test_fsdp_traversal.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:29:30.345377]
2025-12-04T12:29:55.5412851Z 
2025-12-04T12:29:55.5417008Z PRINTING LOG FILE of distributed/fsdp/test_fsdp_traversal 1/1 (test/test-reports/distributed.fsdp.test_fsdp_traversal_1.1_61a5791dc0397606_.log)
2025-12-04T12:29:55.5417967Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_traversal/distributed.fsdp.test_fsdp_traversal-0465fcbb0894d830.xml
2025-12-04T12:29:55.5418521Z ============================= test session starts ==============================
2025-12-04T12:29:55.5418929Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:29:55.5419286Z cachedir: .pytest_cache
2025-12-04T12:29:55.5419756Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:29:55.5420194Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T12:29:55.5420404Z configfile: pytest.ini
2025-12-04T12:29:55.5420819Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T12:29:55.5421270Z collecting ... collected 1 item
2025-12-04T12:29:55.5421516Z stepcurrent: Cannot find last run test, not skipping
2025-12-04T12:29:55.5422006Z Running 1 items in this shard: test/distributed/fsdp/test_fsdp_traversal.py::TestTraversalCUDA::test_fsdp_modules_cuda
2025-12-04T12:29:55.5422344Z 
2025-12-04T12:29:55.5422852Z distributed/fsdp/test_fsdp_traversal.py::TestTraversalCUDA::test_fsdp_modules_cuda I1204 12:29:32.120000 318043 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 318112
2025-12-04T12:29:55.5423766Z I1204 12:29:32.120000 318043 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 318113
2025-12-04T12:29:55.5424366Z [rank1]:E1204 12:29:35.844000 318113 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:29:55.5424977Z [rank1]:E1204 12:29:35.844000 318113 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:29:55.5425780Z [rank1]:E1204 12:29:35.844000 318113 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:29:55.5426523Z [rank1]:E1204 12:29:35.844000 318113 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:29:55.5427178Z [rank1]:E1204 12:29:35.844000 318113 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:29:55.5427829Z [rank1]:E1204 12:29:35.844000 318113 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:29:55.5436507Z [rank1]:E1204 12:29:35.844000 318113 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:55.5437038Z [rank1]:E1204 12:29:35.844000 318113 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:29:55.5437599Z [rank1]:E1204 12:29:35.844000 318113 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:55.5438107Z [rank1]:E1204 12:29:35.844000 318113 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:29:55.5438628Z [rank1]:E1204 12:29:35.844000 318113 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:29:55.5439117Z [rank1]:E1204 12:29:35.844000 318113 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:29:55.5439784Z [rank1]:E1204 12:29:35.844000 318113 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:29:55.5440293Z [rank1]:E1204 12:29:35.844000 318113 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:29:55.5440987Z [rank1]:E1204 12:29:35.844000 318113 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestTraversalCUDA.test_fsdp_modules_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 1864368128 and is now 1868562432.
2025-12-04T12:29:55.5441623Z [rank1]:E1204 12:29:35.844000 318113 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:29:55.5442015Z [rank1]:E1204 12:29:35.844000 318113 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:29:55.5442622Z [rank1]:E1204 12:29:35.844000 318113 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_traversal.py TestTraversalCUDA.test_fsdp_modules_cuda
2025-12-04T12:29:55.5443131Z [rank1]:E1204 12:29:35.844000 318113 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:29:55.5443563Z [rank1]:E1204 12:29:35.844000 318113 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:29:55.5444016Z [rank1]:E1204 12:29:35.844000 318113 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T12:29:55.5444300Z dist init r=1, world=2
2025-12-04T12:29:55.5444535Z [rank0]:E1204 12:29:35.857000 318112 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:29:55.5444902Z [rank0]:E1204 12:29:35.857000 318112 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:29:55.5445402Z [rank0]:E1204 12:29:35.857000 318112 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:29:55.5445884Z [rank0]:E1204 12:29:35.857000 318112 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:29:55.5446359Z [rank0]:E1204 12:29:35.857000 318112 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:29:55.5446805Z [rank0]:E1204 12:29:35.857000 318112 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:29:55.5447247Z [rank0]:E1204 12:29:35.857000 318112 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:55.5447708Z [rank0]:E1204 12:29:35.857000 318112 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:29:55.5448174Z [rank0]:E1204 12:29:35.857000 318112 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:55.5448636Z [rank0]:E1204 12:29:35.857000 318112 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:29:55.5449125Z [rank0]:E1204 12:29:35.857000 318112 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:29:55.5449623Z [rank0]:E1204 12:29:35.857000 318112 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:29:55.5450075Z [rank0]:E1204 12:29:35.857000 318112 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:29:55.5450545Z [rank0]:E1204 12:29:35.857000 318112 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:29:55.5451166Z [rank0]:E1204 12:29:35.857000 318112 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestTraversalCUDA.test_fsdp_modules_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 2017460224 and is now 2021654528.
2025-12-04T12:29:55.5451748Z [rank0]:E1204 12:29:35.857000 318112 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:29:55.5452096Z [rank0]:E1204 12:29:35.857000 318112 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:29:55.5452646Z [rank0]:E1204 12:29:35.857000 318112 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_traversal.py TestTraversalCUDA.test_fsdp_modules_cuda
2025-12-04T12:29:55.5453134Z [rank0]:E1204 12:29:35.857000 318112 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:29:55.5453494Z [rank0]:E1204 12:29:35.857000 318112 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:29:55.5453926Z [rank0]:E1204 12:29:35.857000 318112 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T12:29:55.5454171Z dist init r=0, world=2
2025-12-04T12:29:55.5454585Z [rank0]:[W1204 12:29:36.092753861 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T12:29:55.5454991Z FAILED [5.2119s] [100%]
2025-12-04T12:29:55.5455059Z 
2025-12-04T12:29:55.5455118Z =================================== FAILURES ===================================
2025-12-04T12:29:55.5455300Z ___________________ TestTraversalCUDA.test_fsdp_modules_cuda ___________________
2025-12-04T12:29:55.5455470Z Traceback (most recent call last):
2025-12-04T12:29:55.5455722Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T12:29:55.5455970Z     self._join_processes(fn)
2025-12-04T12:29:55.5456219Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T12:29:55.5456485Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T12:29:55.5456754Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T12:29:55.5457014Z     raise RuntimeError(error)
2025-12-04T12:29:55.5457167Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:29:55.5457331Z Traceback (most recent call last):
2025-12-04T12:29:55.5457570Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:29:55.5457811Z     getattr(self, test_name)()
2025-12-04T12:29:55.5458080Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:29:55.5458316Z     fn()
2025-12-04T12:29:55.5458519Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:55.5458751Z     method(*args, **kwargs)
2025-12-04T12:29:55.5458976Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:55.5459207Z     method(*args, **kwargs)
2025-12-04T12:29:55.5459429Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:29:55.5459697Z     with policy():
2025-12-04T12:29:55.5459910Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:29:55.5460141Z     raise RuntimeError(msg)
2025-12-04T12:29:55.5460524Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestTraversalCUDA.test_fsdp_modules_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 2017460224 and is now 2021654528.
2025-12-04T12:29:55.5460869Z 
2025-12-04T12:29:55.5460943Z To execute this test, run the following from the base repo dir:
2025-12-04T12:29:55.5461247Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_traversal.py TestTraversalCUDA.test_fsdp_modules_cuda
2025-12-04T12:29:55.5461478Z 
2025-12-04T12:29:55.5461586Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:29:55.5461714Z 
2025-12-04T12:29:55.5461773Z Process 1 exited with error code 10 and exception:
2025-12-04T12:29:55.5461913Z Traceback (most recent call last):
2025-12-04T12:29:55.5462155Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:29:55.5462414Z     getattr(self, test_name)()
2025-12-04T12:29:55.5462648Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:29:55.5462879Z     fn()
2025-12-04T12:29:55.5463079Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:55.5463308Z     method(*args, **kwargs)
2025-12-04T12:29:55.5463527Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:55.5463759Z     method(*args, **kwargs)
2025-12-04T12:29:55.5463977Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:29:55.5464202Z     with policy():
2025-12-04T12:29:55.5464413Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:29:55.5464648Z     raise RuntimeError(msg)
2025-12-04T12:29:55.5465025Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestTraversalCUDA.test_fsdp_modules_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 1864368128 and is now 1868562432.
2025-12-04T12:29:55.5465364Z 
2025-12-04T12:29:55.5465442Z To execute this test, run the following from the base repo dir:
2025-12-04T12:29:55.5465740Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_traversal.py TestTraversalCUDA.test_fsdp_modules_cuda
2025-12-04T12:29:55.5465963Z 
2025-12-04T12:29:55.5466053Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:29:55.5466178Z 
2025-12-04T12:29:55.5466180Z 
2025-12-04T12:29:55.5466263Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:29:55.5466467Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T12:29:55.5466879Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_traversal/distributed.fsdp.test_fsdp_traversal-0465fcbb0894d830.xml -
2025-12-04T12:29:55.5467225Z =========================== short test summary info ============================
2025-12-04T12:29:55.5467539Z FAILED [5.2119s] distributed/fsdp/test_fsdp_traversal.py::TestTraversalCUDA::test_fsdp_modules_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:29:55.5467828Z Traceback (most recent call last):
2025-12-04T12:29:55.5468074Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:29:55.5468319Z     getattr(self, test_name)()
2025-12-04T12:29:55.5468551Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:29:55.5468785Z     fn()
2025-12-04T12:29:55.5468991Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:55.5469222Z     method(*args, **kwargs)
2025-12-04T12:29:55.5469441Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:55.5469703Z     method(*args, **kwargs)
2025-12-04T12:29:55.5469925Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:29:55.5470176Z     with policy():
2025-12-04T12:29:55.5470387Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:29:55.5470618Z     raise RuntimeError(msg)
2025-12-04T12:29:55.5470996Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestTraversalCUDA.test_fsdp_modules_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 2017460224 and is now 2021654528.
2025-12-04T12:29:55.5471350Z 
2025-12-04T12:29:55.5471424Z To execute this test, run the following from the base repo dir:
2025-12-04T12:29:55.5471723Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_traversal.py TestTraversalCUDA.test_fsdp_modules_cuda
2025-12-04T12:29:55.5471948Z 
2025-12-04T12:29:55.5472033Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:29:55.5472156Z 
2025-12-04T12:29:55.5472217Z Process 1 exited with error code 10 and exception:
2025-12-04T12:29:55.5472354Z Traceback (most recent call last):
2025-12-04T12:29:55.5472593Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:29:55.5472831Z     getattr(self, test_name)()
2025-12-04T12:29:55.5473058Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:29:55.5473291Z     fn()
2025-12-04T12:29:55.5473487Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:55.5473711Z     method(*args, **kwargs)
2025-12-04T12:29:55.5473926Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:55.5474153Z     method(*args, **kwargs)
2025-12-04T12:29:55.5474368Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:29:55.5474592Z     with policy():
2025-12-04T12:29:55.5474798Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:29:55.5475027Z     raise RuntimeError(msg)
2025-12-04T12:29:55.5475437Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestTraversalCUDA.test_fsdp_modules_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 1864368128 and is now 1868562432.
2025-12-04T12:29:55.5475776Z 
2025-12-04T12:29:55.5475850Z To execute this test, run the following from the base repo dir:
2025-12-04T12:29:55.5476145Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_traversal.py TestTraversalCUDA.test_fsdp_modules_cuda
2025-12-04T12:29:55.5476366Z 
2025-12-04T12:29:55.5476456Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:29:55.5476642Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:29:55.5476801Z ============================== 1 failed in 5.37s ===============================
2025-12-04T12:29:55.5476931Z Got exit code 1
2025-12-04T12:29:55.5477026Z Retrying single test...
2025-12-04T12:29:55.5477299Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_traversal/distributed.fsdp.test_fsdp_traversal-a7a57420ca89d548.xml
2025-12-04T12:29:55.5477595Z ============================= test session starts ==============================
2025-12-04T12:29:55.5477805Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:29:55.5477992Z cachedir: .pytest_cache
2025-12-04T12:29:55.5478212Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:29:55.5478463Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T12:29:55.5478580Z configfile: pytest.ini
2025-12-04T12:29:55.5478805Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T12:29:55.5479045Z collecting ... collected 1 item
2025-12-04T12:29:55.5479302Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_fsdp_traversal.py::TestTraversalCUDA::test_fsdp_modules_cuda
2025-12-04T12:29:55.5479615Z Running 1 items in this shard
2025-12-04T12:29:55.5479687Z 
2025-12-04T12:29:55.5479962Z distributed/fsdp/test_fsdp_traversal.py::TestTraversalCUDA::test_fsdp_modules_cuda I1204 12:29:39.658000 318271 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 318340
2025-12-04T12:29:55.5480423Z I1204 12:29:39.659000 318271 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 318341
2025-12-04T12:29:55.5480757Z [rank0]:E1204 12:29:43.421000 318340 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:29:55.5481097Z [rank0]:E1204 12:29:43.421000 318340 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:29:55.5481587Z [rank0]:E1204 12:29:43.421000 318340 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:29:55.5482065Z [rank0]:E1204 12:29:43.421000 318340 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:29:55.5482539Z [rank0]:E1204 12:29:43.421000 318340 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:29:55.5482980Z [rank0]:E1204 12:29:43.421000 318340 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:29:55.5483416Z [rank0]:E1204 12:29:43.421000 318340 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:55.5483881Z [rank0]:E1204 12:29:43.421000 318340 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:29:55.5484374Z [rank0]:E1204 12:29:43.421000 318340 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:55.5484832Z [rank0]:E1204 12:29:43.421000 318340 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:29:55.5485290Z [rank0]:E1204 12:29:43.421000 318340 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:29:55.5485742Z [rank0]:E1204 12:29:43.421000 318340 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:29:55.5486197Z [rank0]:E1204 12:29:43.421000 318340 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:29:55.5486659Z [rank0]:E1204 12:29:43.421000 318340 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:29:55.5487277Z [rank0]:E1204 12:29:43.421000 318340 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestTraversalCUDA.test_fsdp_modules_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 2017460224 and is now 2021654528.
2025-12-04T12:29:55.5487872Z [rank0]:E1204 12:29:43.421000 318340 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:29:55.5488217Z [rank0]:E1204 12:29:43.421000 318340 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:29:55.5488782Z [rank0]:E1204 12:29:43.421000 318340 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_traversal.py TestTraversalCUDA.test_fsdp_modules_cuda
2025-12-04T12:29:55.5489243Z [rank0]:E1204 12:29:43.421000 318340 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:29:55.5489641Z [rank0]:E1204 12:29:43.421000 318340 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:29:55.5490051Z [rank0]:E1204 12:29:43.421000 318340 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T12:29:55.5490288Z dist init r=0, world=2
2025-12-04T12:29:55.5490489Z [rank1]:E1204 12:29:43.427000 318341 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:29:55.5490825Z [rank1]:E1204 12:29:43.427000 318341 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:29:55.5491311Z [rank1]:E1204 12:29:43.427000 318341 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:29:55.5491784Z [rank1]:E1204 12:29:43.427000 318341 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:29:55.5492261Z [rank1]:E1204 12:29:43.427000 318341 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:29:55.5492708Z [rank1]:E1204 12:29:43.427000 318341 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:29:55.5493176Z [rank1]:E1204 12:29:43.427000 318341 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:55.5493636Z [rank1]:E1204 12:29:43.427000 318341 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:29:55.5494094Z [rank1]:E1204 12:29:43.427000 318341 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:55.5494554Z [rank1]:E1204 12:29:43.427000 318341 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:29:55.5495012Z [rank1]:E1204 12:29:43.427000 318341 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:29:55.5495464Z [rank1]:E1204 12:29:43.427000 318341 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:29:55.5495916Z [rank1]:E1204 12:29:43.427000 318341 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:29:55.5496376Z [rank1]:E1204 12:29:43.427000 318341 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:29:55.5496989Z [rank1]:E1204 12:29:43.427000 318341 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestTraversalCUDA.test_fsdp_modules_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 1864368128 and is now 1868562432.
2025-12-04T12:29:55.5497583Z [rank1]:E1204 12:29:43.427000 318341 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:29:55.5497947Z [rank1]:E1204 12:29:43.427000 318341 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:29:55.5498490Z [rank1]:E1204 12:29:43.427000 318341 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_traversal.py TestTraversalCUDA.test_fsdp_modules_cuda
2025-12-04T12:29:55.5498954Z [rank1]:E1204 12:29:43.427000 318341 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:29:55.5499316Z [rank1]:E1204 12:29:43.427000 318341 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:29:55.5499773Z [rank1]:E1204 12:29:43.427000 318341 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T12:29:55.5500013Z dist init r=1, world=2
2025-12-04T12:29:55.5500409Z [rank0]:[W1204 12:29:43.578553682 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T12:29:55.5500814Z FAILED [5.1111s] [100%]
2025-12-04T12:29:55.5500876Z 
2025-12-04T12:29:55.5500934Z =================================== FAILURES ===================================
2025-12-04T12:29:55.5501112Z ___________________ TestTraversalCUDA.test_fsdp_modules_cuda ___________________
2025-12-04T12:29:55.5501275Z Traceback (most recent call last):
2025-12-04T12:29:55.5501517Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T12:29:55.5501759Z     self._join_processes(fn)
2025-12-04T12:29:55.5502003Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T12:29:55.5502324Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T12:29:55.5502589Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T12:29:55.5502848Z     raise RuntimeError(error)
2025-12-04T12:29:55.5502995Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:29:55.5503155Z Traceback (most recent call last):
2025-12-04T12:29:55.5503393Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:29:55.5503633Z     getattr(self, test_name)()
2025-12-04T12:29:55.5503863Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:29:55.5504090Z     fn()
2025-12-04T12:29:55.5504291Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:55.5504522Z     method(*args, **kwargs)
2025-12-04T12:29:55.5504740Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:55.5504966Z     method(*args, **kwargs)
2025-12-04T12:29:55.5505183Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:29:55.5505407Z     with policy():
2025-12-04T12:29:55.5505635Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:29:55.5505863Z     raise RuntimeError(msg)
2025-12-04T12:29:55.5506236Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestTraversalCUDA.test_fsdp_modules_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 2017460224 and is now 2021654528.
2025-12-04T12:29:55.5506594Z 
2025-12-04T12:29:55.5506671Z To execute this test, run the following from the base repo dir:
2025-12-04T12:29:55.5506972Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_traversal.py TestTraversalCUDA.test_fsdp_modules_cuda
2025-12-04T12:29:55.5507196Z 
2025-12-04T12:29:55.5507283Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:29:55.5507407Z 
2025-12-04T12:29:55.5507409Z 
2025-12-04T12:29:55.5507484Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:29:55.5507684Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T12:29:55.5508061Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_traversal/distributed.fsdp.test_fsdp_traversal-a7a57420ca89d548.xml -
2025-12-04T12:29:55.5508401Z =========================== short test summary info ============================
2025-12-04T12:29:55.5508710Z FAILED [5.1111s] distributed/fsdp/test_fsdp_traversal.py::TestTraversalCUDA::test_fsdp_modules_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:29:55.5508996Z Traceback (most recent call last):
2025-12-04T12:29:55.5509236Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:29:55.5509478Z     getattr(self, test_name)()
2025-12-04T12:29:55.5509757Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:29:55.5509989Z     fn()
2025-12-04T12:29:55.5510188Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:55.5510417Z     method(*args, **kwargs)
2025-12-04T12:29:55.5510635Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:55.5510900Z     method(*args, **kwargs)
2025-12-04T12:29:55.5511118Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:29:55.5511340Z     with policy():
2025-12-04T12:29:55.5511549Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:29:55.5511776Z     raise RuntimeError(msg)
2025-12-04T12:29:55.5512149Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestTraversalCUDA.test_fsdp_modules_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 2017460224 and is now 2021654528.
2025-12-04T12:29:55.5512491Z 
2025-12-04T12:29:55.5512564Z To execute this test, run the following from the base repo dir:
2025-12-04T12:29:55.5512866Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_traversal.py TestTraversalCUDA.test_fsdp_modules_cuda
2025-12-04T12:29:55.5513092Z 
2025-12-04T12:29:55.5513178Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:29:55.5513367Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:29:55.5513524Z ============================== 1 failed in 5.27s ===============================
2025-12-04T12:29:55.5513653Z Got exit code 1
2025-12-04T12:29:55.5513748Z Retrying single test...
2025-12-04T12:29:55.5514015Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_traversal/distributed.fsdp.test_fsdp_traversal-fcbdb9cd508712d5.xml
2025-12-04T12:29:55.5514328Z ============================= test session starts ==============================
2025-12-04T12:29:55.5514537Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:29:55.5514741Z cachedir: .pytest_cache
2025-12-04T12:29:55.5514965Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:29:55.5515203Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T12:29:55.5515320Z configfile: pytest.ini
2025-12-04T12:29:55.5515544Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T12:29:55.5515787Z collecting ... collected 1 item
2025-12-04T12:29:55.5516046Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_fsdp_traversal.py::TestTraversalCUDA::test_fsdp_modules_cuda
2025-12-04T12:29:55.5516307Z Running 1 items in this shard
2025-12-04T12:29:55.5516380Z 
2025-12-04T12:29:55.5516652Z distributed/fsdp/test_fsdp_traversal.py::TestTraversalCUDA::test_fsdp_modules_cuda I1204 12:29:47.249000 318499 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 318568
2025-12-04T12:29:55.5517114Z I1204 12:29:47.249000 318499 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 318569
2025-12-04T12:29:55.5517442Z [rank1]:E1204 12:29:51.027000 318569 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:29:55.5517779Z [rank1]:E1204 12:29:51.027000 318569 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:29:55.5518263Z [rank1]:E1204 12:29:51.027000 318569 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:29:55.5518737Z [rank1]:E1204 12:29:51.027000 318569 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:29:55.5519240Z [rank1]:E1204 12:29:51.027000 318569 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:29:55.5519724Z [rank1]:E1204 12:29:51.027000 318569 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:29:55.5520160Z [rank1]:E1204 12:29:51.027000 318569 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:55.5520620Z [rank1]:E1204 12:29:51.027000 318569 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:29:55.5521084Z [rank1]:E1204 12:29:51.027000 318569 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:55.5521545Z [rank1]:E1204 12:29:51.027000 318569 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:29:55.5522011Z [rank1]:E1204 12:29:51.027000 318569 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:29:55.5522465Z [rank1]:E1204 12:29:51.027000 318569 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:29:55.5522917Z [rank1]:E1204 12:29:51.027000 318569 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:29:55.5523398Z [rank1]:E1204 12:29:51.027000 318569 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:29:55.5524015Z [rank1]:E1204 12:29:51.027000 318569 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestTraversalCUDA.test_fsdp_modules_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 1864368128 and is now 1868562432.
2025-12-04T12:29:55.5524608Z [rank1]:E1204 12:29:51.027000 318569 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:29:55.5524952Z [rank1]:E1204 12:29:51.027000 318569 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:29:55.5525495Z [rank1]:E1204 12:29:51.027000 318569 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_traversal.py TestTraversalCUDA.test_fsdp_modules_cuda
2025-12-04T12:29:55.5525957Z [rank1]:E1204 12:29:51.027000 318569 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:29:55.5526321Z [rank1]:E1204 12:29:51.027000 318569 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:29:55.5526731Z [rank1]:E1204 12:29:51.027000 318569 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T12:29:55.5526968Z dist init r=1, world=2
2025-12-04T12:29:55.5527167Z [rank0]:E1204 12:29:51.054000 318568 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:29:55.5527500Z [rank0]:E1204 12:29:51.054000 318568 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:29:55.5527981Z [rank0]:E1204 12:29:51.054000 318568 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:29:55.5528489Z [rank0]:E1204 12:29:51.054000 318568 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:29:55.5528965Z [rank0]:E1204 12:29:51.054000 318568 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:29:55.5529407Z [rank0]:E1204 12:29:51.054000 318568 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:29:55.5529879Z [rank0]:E1204 12:29:51.054000 318568 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:55.5530342Z [rank0]:E1204 12:29:51.054000 318568 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:29:55.5530808Z [rank0]:E1204 12:29:51.054000 318568 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:55.5531270Z [rank0]:E1204 12:29:51.054000 318568 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:29:55.5531729Z [rank0]:E1204 12:29:51.054000 318568 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:29:55.5532191Z [rank0]:E1204 12:29:51.054000 318568 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:29:55.5532639Z [rank0]:E1204 12:29:51.054000 318568 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:29:55.5533164Z [rank0]:E1204 12:29:51.054000 318568 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:29:55.5533779Z [rank0]:E1204 12:29:51.054000 318568 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestTraversalCUDA.test_fsdp_modules_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 2017460224 and is now 2021654528.
2025-12-04T12:29:55.5534356Z [rank0]:E1204 12:29:51.054000 318568 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:29:55.5534701Z [rank0]:E1204 12:29:51.054000 318568 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:29:55.5535248Z [rank0]:E1204 12:29:51.054000 318568 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_traversal.py TestTraversalCUDA.test_fsdp_modules_cuda
2025-12-04T12:29:55.5535717Z [rank0]:E1204 12:29:51.054000 318568 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:29:55.5536080Z [rank0]:E1204 12:29:51.054000 318568 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:29:55.5536490Z [rank0]:E1204 12:29:51.054000 318568 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T12:29:55.5536728Z dist init r=0, world=2
2025-12-04T12:29:55.5537260Z [rank0]:[W1204 12:29:51.244603609 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T12:29:55.5537664Z FAILED [5.2120s] [100%]
2025-12-04T12:29:55.5537726Z 
2025-12-04T12:29:55.5537826Z =================================== FAILURES ===================================
2025-12-04T12:29:55.5538004Z ___________________ TestTraversalCUDA.test_fsdp_modules_cuda ___________________
2025-12-04T12:29:55.5538169Z Traceback (most recent call last):
2025-12-04T12:29:55.5538412Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T12:29:55.5538651Z     self._join_processes(fn)
2025-12-04T12:29:55.5538892Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T12:29:55.5539150Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T12:29:55.5539413Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T12:29:55.5539708Z     raise RuntimeError(error)
2025-12-04T12:29:55.5539857Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T12:29:55.5540018Z Traceback (most recent call last):
2025-12-04T12:29:55.5540253Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:29:55.5540491Z     getattr(self, test_name)()
2025-12-04T12:29:55.5540720Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:29:55.5540949Z     fn()
2025-12-04T12:29:55.5541149Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:55.5541392Z     method(*args, **kwargs)
2025-12-04T12:29:55.5541609Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:55.5541835Z     method(*args, **kwargs)
2025-12-04T12:29:55.5542065Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:29:55.5542290Z     with policy():
2025-12-04T12:29:55.5542497Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:29:55.5542721Z     raise RuntimeError(msg)
2025-12-04T12:29:55.5543089Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestTraversalCUDA.test_fsdp_modules_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 1864368128 and is now 1868562432.
2025-12-04T12:29:55.5543430Z 
2025-12-04T12:29:55.5543503Z To execute this test, run the following from the base repo dir:
2025-12-04T12:29:55.5543801Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_traversal.py TestTraversalCUDA.test_fsdp_modules_cuda
2025-12-04T12:29:55.5544023Z 
2025-12-04T12:29:55.5544111Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:29:55.5544232Z 
2025-12-04T12:29:55.5544235Z 
2025-12-04T12:29:55.5544311Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:29:55.5544506Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T12:29:55.5544884Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_traversal/distributed.fsdp.test_fsdp_traversal-fcbdb9cd508712d5.xml -
2025-12-04T12:29:55.5545228Z =========================== short test summary info ============================
2025-12-04T12:29:55.5545536Z FAILED [5.2120s] distributed/fsdp/test_fsdp_traversal.py::TestTraversalCUDA::test_fsdp_modules_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T12:29:55.5545821Z Traceback (most recent call last):
2025-12-04T12:29:55.5546067Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:29:55.5546312Z     getattr(self, test_name)()
2025-12-04T12:29:55.5546576Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:29:55.5546810Z     fn()
2025-12-04T12:29:55.5547009Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:55.5547236Z     method(*args, **kwargs)
2025-12-04T12:29:55.5547454Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:29:55.5547682Z     method(*args, **kwargs)
2025-12-04T12:29:55.5547898Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:29:55.5548121Z     with policy():
2025-12-04T12:29:55.5548332Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:29:55.5548565Z     raise RuntimeError(msg)
2025-12-04T12:29:55.5548937Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestTraversalCUDA.test_fsdp_modules_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 1864368128 and is now 1868562432.
2025-12-04T12:29:55.5549279Z 
2025-12-04T12:29:55.5549353Z To execute this test, run the following from the base repo dir:
2025-12-04T12:29:55.5549691Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_traversal.py TestTraversalCUDA.test_fsdp_modules_cuda
2025-12-04T12:29:55.5549938Z 
2025-12-04T12:29:55.5550024Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:29:55.5550210Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:29:55.5550366Z ============================== 1 failed in 5.35s ===============================
2025-12-04T12:29:55.5550511Z Got exit code 1
2025-12-04T12:29:55.5550710Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_traversal.py::TestTraversalCUDA::test_fsdp_modules_cuda
2025-12-04T12:29:55.5551013Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:29:55.5551378Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_traversal/distributed.fsdp.test_fsdp_traversal-6360f1f260e2dc0b.xml
2025-12-04T12:29:55.5551674Z ============================= test session starts ==============================
2025-12-04T12:29:55.5551885Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:29:55.5552074Z cachedir: .pytest_cache
2025-12-04T12:29:55.5552300Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:29:55.5552538Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T12:29:55.5552656Z configfile: pytest.ini
2025-12-04T12:29:55.5552884Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T12:29:55.5553150Z collecting ... collected 1 item / 1 deselected / 0 selected
2025-12-04T12:29:55.5553307Z stepcurrent: skipping 1 already run items.
2025-12-04T12:29:55.5553433Z Running 0 items in this shard
2025-12-04T12:29:55.5553508Z 
2025-12-04T12:29:55.5553755Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_traversal/distributed.fsdp.test_fsdp_traversal-6360f1f260e2dc0b.xml -
2025-12-04T12:29:55.5554097Z ============================ 1 deselected in 0.00s =============================
2025-12-04T12:29:55.5554360Z The following tests failed consistently: ['test/distributed/fsdp/test_fsdp_traversal.py::TestTraversalCUDA::test_fsdp_modules_cuda']
2025-12-04T12:29:55.5554563Z 
2025-12-04T12:29:55.5554791Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_traversal 1/1 (test/test-reports/distributed.fsdp.test_fsdp_traversal_1.1_61a5791dc0397606_.log)
2025-12-04T12:29:55.5555021Z 
2025-12-04T12:29:55.5555153Z Finished distributed/fsdp/test_fsdp_traversal 1/1 ... [2025-12-04 12:29:55.541341][5229436.520378029], took 0.42min
2025-12-04T12:29:55.5555594Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T12:29:55.5555980Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T12:29:55.5556196Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading
2025-12-04T12:29:55.5556377Z Uploading artifacts took 0.00 seconds
2025-12-04T12:29:55.5556517Z distributed/fsdp/test_fsdp_traversal 1/1 failed!
2025-12-04T12:29:55.5556731Z Running distributed/fsdp/test_fsdp_ignored_modules 1/1 ... [2025-12-04 12:29:55.544353][5229436.523394849]
2025-12-04T12:29:55.5556937Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T12:29:55.5557360Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/fsdp/test_fsdp_ignored_modules.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:29:55.544542]
2025-12-04T12:30:48.0842556Z 
2025-12-04T12:30:48.0843173Z distributed/fsdp/test_fsdp_ignored_modules 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.fsdp.test_fsdp_ignored_modules_1.1_82e52ac70f0a3012_.log
2025-12-04T12:30:48.0845366Z Running 8 items in this shard: test/distributed/fsdp/test_fsdp_ignored_modules.py::TestFSDPIgnoredModules::test_diff_ignored_modules_across_ranks, test/distributed/fsdp/test_fsdp_ignored_modules.py::TestFSDPIgnoredModules::test_ignored_modules_invalid, test/distributed/fsdp/test_fsdp_ignored_modules.py::TestFSDPIgnoredModules::test_ignored_modules_nested, test/distributed/fsdp/test_fsdp_ignored_modules.py::TestFSDPIgnoredModules::test_ignored_modules_not_under_wrapped_root_ignore_modules_False, test/distributed/fsdp/test_fsdp_ignored_modules.py::TestFSDPIgnoredModules::test_ignored_modules_not_under_wrapped_root_ignore_modules_True, test/distributed/fsdp/test_fsdp_ignored_modules.py::TestFSDPIgnoredModules::test_ignored_modules_transformer, test/distributed/fsdp/test_fsdp_ignored_modules.py::TestFSDPIgnoredModules::test_ignored_states_auto_wrap, test/distributed/fsdp/test_fsdp_ignored_modules.py::TestFSDPIgnoredModules::test_ignored_states_check
2025-12-04T12:30:48.0846777Z 
2025-12-04T12:30:48.0846931Z Finished distributed/fsdp/test_fsdp_ignored_modules 1/1 ... [2025-12-04 12:30:48.083941][5229489.062979017], took 0.88min
2025-12-04T12:30:48.0861304Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T12:30:48.0872616Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T12:30:48.0874880Z Running distributed/fsdp/test_checkpoint_wrapper 1/1 ... [2025-12-04 12:30:48.087390][5229489.066431092]
2025-12-04T12:30:48.0875100Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T12:30:48.0876838Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/fsdp/test_checkpoint_wrapper.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:30:48.087586]
2025-12-04T12:30:53.2107301Z 
2025-12-04T12:30:53.2107937Z distributed/fsdp/test_checkpoint_wrapper 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.fsdp.test_checkpoint_wrapper_1.1_3876383eab787904_.log
2025-12-04T12:30:53.2110133Z Running 8 items in this shard: test/distributed/fsdp/test_checkpoint_wrapper.py::CheckpointWrapperTest::test_apply_activation_checkpointing, test/distributed/fsdp/test_checkpoint_wrapper.py::CheckpointWrapperTest::test_checkpoint_wrapper_args_kwargs, test/distributed/fsdp/test_checkpoint_wrapper.py::CheckpointWrapperTest::test_checkpoint_wrapper_cpu_offload, test/distributed/fsdp/test_checkpoint_wrapper.py::CheckpointWrapperTest::test_checkpoint_wrapper_kwarg_support, test/distributed/fsdp/test_checkpoint_wrapper.py::CheckpointWrapperTest::test_checkpoint_wrapper_parity, test/distributed/fsdp/test_checkpoint_wrapper.py::CheckpointWrapperTest::test_forward_missing_attributes, test/distributed/fsdp/test_checkpoint_wrapper.py::CheckpointWrapperTest::test_fqn, test/distributed/fsdp/test_checkpoint_wrapper.py::CheckpointWrapperTest::test_load_activation_checkpointed_module
2025-12-04T12:30:53.2111419Z 
2025-12-04T12:30:53.2111574Z Finished distributed/fsdp/test_checkpoint_wrapper 1/1 ... [2025-12-04 12:30:53.210438][5229494.189476779], took 0.09min
2025-12-04T12:30:53.2124918Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T12:30:53.2135308Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T12:30:53.2137541Z Running distributed/fsdp/test_fsdp_checkpoint 1/1 ... [2025-12-04 12:30:53.213678][5229494.192719747]
2025-12-04T12:30:53.2137759Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T12:30:53.2139732Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/fsdp/test_fsdp_checkpoint.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:30:53.213871]
2025-12-04T12:33:42.6898536Z 
2025-12-04T12:33:42.6899739Z PRINTING LOG FILE of distributed/fsdp/test_fsdp_checkpoint 1/1 (test/test-reports/distributed.fsdp.test_fsdp_checkpoint_1.1_c244fb9a4f737098_.log)
2025-12-04T12:33:42.6901462Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_checkpoint/distributed.fsdp.test_fsdp_checkpoint-ef762e98490b0b73.xml
2025-12-04T12:33:42.6902210Z ============================= test session starts ==============================
2025-12-04T12:33:42.6902714Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:33:42.6903161Z cachedir: .pytest_cache
2025-12-04T12:33:42.6903648Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:33:42.6904159Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T12:33:42.6904401Z configfile: pytest.ini
2025-12-04T12:33:42.6904868Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T12:33:42.6905959Z collecting ... /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_checkpoint.py:292: PytestCollectionWarning: cannot collect test class 'TestModel' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_checkpoint.py)
2025-12-04T12:33:42.6906815Z   class TestModel(nn.Module):
2025-12-04T12:33:42.6907040Z collected 17 items
2025-12-04T12:33:42.6907279Z stepcurrent: Cannot find last run test, not skipping
2025-12-04T12:33:42.6914441Z Running 17 items in this shard: test/distributed/fsdp/test_fsdp_checkpoint.py::TestFSDPCheckpoint::test_basic_checkpoint_end_to_end_cpu_offload0_offload_activations_False_use_orig_params_False, test/distributed/fsdp/test_fsdp_checkpoint.py::TestFSDPCheckpoint::test_basic_checkpoint_end_to_end_cpu_offload0_offload_activations_False_use_orig_params_True, test/distributed/fsdp/test_fsdp_checkpoint.py::TestFSDPCheckpoint::test_basic_checkpoint_end_to_end_cpu_offload0_offload_activations_True_use_orig_params_False, test/distributed/fsdp/test_fsdp_checkpoint.py::TestFSDPCheckpoint::test_basic_checkpoint_end_to_end_cpu_offload0_offload_activations_True_use_orig_params_True, test/distributed/fsdp/test_fsdp_checkpoint.py::TestFSDPCheckpoint::test_basic_checkpoint_end_to_end_cpu_offload1_offload_activations_False_use_orig_params_False, test/distributed/fsdp/test_fsdp_checkpoint.py::TestFSDPCheckpoint::test_basic_checkpoint_end_to_end_cpu_offload1_offload_activations_False_use_orig_params_True, test/distributed/fsdp/test_fsdp_checkpoint.py::TestFSDPCheckpoint::test_basic_checkpoint_end_to_end_cpu_offload1_offload_activations_True_use_orig_params_False, test/distributed/fsdp/test_fsdp_checkpoint.py::TestFSDPCheckpoint::test_basic_checkpoint_end_to_end_cpu_offload1_offload_activations_True_use_orig_params_True, test/distributed/fsdp/test_fsdp_checkpoint.py::TestFSDPCheckpoint::test_checkpoint_fsdp_wrapping_cpu_offload0_offload_activations_False_use_orig_params_False, test/distributed/fsdp/test_fsdp_checkpoint.py::TestFSDPCheckpoint::test_checkpoint_fsdp_wrapping_cpu_offload0_offload_activations_False_use_orig_params_True, test/distributed/fsdp/test_fsdp_checkpoint.py::TestFSDPCheckpoint::test_checkpoint_fsdp_wrapping_cpu_offload0_offload_activations_True_use_orig_params_False, test/distributed/fsdp/test_fsdp_checkpoint.py::TestFSDPCheckpoint::test_checkpoint_fsdp_wrapping_cpu_offload0_offload_activations_True_use_orig_params_True, test/distributed/fsdp/test_fsdp_checkpoint.py::TestFSDPCheckpoint::test_checkpoint_fsdp_wrapping_cpu_offload1_offload_activations_False_use_orig_params_False, test/distributed/fsdp/test_fsdp_checkpoint.py::TestFSDPCheckpoint::test_checkpoint_fsdp_wrapping_cpu_offload1_offload_activations_False_use_orig_params_True, test/distributed/fsdp/test_fsdp_checkpoint.py::TestFSDPCheckpoint::test_checkpoint_fsdp_wrapping_cpu_offload1_offload_activations_True_use_orig_params_False, test/distributed/fsdp/test_fsdp_checkpoint.py::TestFSDPCheckpoint::test_checkpoint_fsdp_wrapping_cpu_offload1_offload_activations_True_use_orig_params_True, test/distributed/fsdp/test_fsdp_checkpoint.py::TestFSDPCheckpointSubmoduleCUDA::test_checkpoint_submodule_use_reentrant_False_cuda
2025-12-04T12:33:42.6919814Z 
2025-12-04T12:33:42.6920426Z distributed/fsdp/test_fsdp_checkpoint.py::TestFSDPCheckpoint::test_basic_checkpoint_end_to_end_cpu_offload0_offload_activations_False_use_orig_params_False I1204 12:30:54.991000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 320442
2025-12-04T12:33:42.6921212Z I1204 12:30:54.992000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 320443
2025-12-04T12:33:42.6921703Z I1204 12:30:54.992000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 320444
2025-12-04T12:33:42.6922184Z I1204 12:30:54.993000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 320445
2025-12-04T12:33:42.6922875Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T12:33:42.6923367Z   return func(*args, **kwargs)
2025-12-04T12:33:42.6923494Z dist init r=0, world=4
2025-12-04T12:33:42.6923605Z dist init r=3, world=4
2025-12-04T12:33:42.6923712Z dist init r=1, world=4
2025-12-04T12:33:42.6923817Z dist init r=2, world=4
2025-12-04T12:33:42.6923924Z PASSED [8.7186s] [  5%]
2025-12-04T12:33:42.6924394Z distributed/fsdp/test_fsdp_checkpoint.py::TestFSDPCheckpoint::test_basic_checkpoint_end_to_end_cpu_offload0_offload_activations_False_use_orig_params_True I1204 12:31:03.713000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 320775
2025-12-04T12:33:42.6924999Z I1204 12:31:03.714000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 320776
2025-12-04T12:33:42.6925375Z I1204 12:31:03.714000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 320777
2025-12-04T12:33:42.6925807Z I1204 12:31:03.715000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 320778
2025-12-04T12:33:42.6926331Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T12:33:42.6926736Z   return func(*args, **kwargs)
2025-12-04T12:33:42.6926857Z dist init r=0, world=4
2025-12-04T12:33:42.6926964Z dist init r=3, world=4
2025-12-04T12:33:42.6927074Z dist init r=1, world=4
2025-12-04T12:33:42.6927180Z dist init r=2, world=4
2025-12-04T12:33:42.6927286Z PASSED [8.1161s] [ 11%]
2025-12-04T12:33:42.6927752Z distributed/fsdp/test_fsdp_checkpoint.py::TestFSDPCheckpoint::test_basic_checkpoint_end_to_end_cpu_offload0_offload_activations_True_use_orig_params_False I1204 12:31:11.830000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 321108
2025-12-04T12:33:42.6928361Z I1204 12:31:11.831000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 321109
2025-12-04T12:33:42.6928734Z I1204 12:31:11.831000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 321110
2025-12-04T12:33:42.6929106Z I1204 12:31:11.832000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 321111
2025-12-04T12:33:42.6929668Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T12:33:42.6930103Z   return func(*args, **kwargs)
2025-12-04T12:33:42.6930223Z dist init r=0, world=4
2025-12-04T12:33:42.6930330Z dist init r=3, world=4
2025-12-04T12:33:42.6930437Z dist init r=1, world=4
2025-12-04T12:33:42.6930542Z dist init r=2, world=4
2025-12-04T12:33:42.6930668Z PASSED [8.3163s] [ 17%]
2025-12-04T12:33:42.6931131Z distributed/fsdp/test_fsdp_checkpoint.py::TestFSDPCheckpoint::test_basic_checkpoint_end_to_end_cpu_offload0_offload_activations_True_use_orig_params_True I1204 12:31:20.148000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 321441
2025-12-04T12:33:42.6931729Z I1204 12:31:20.149000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 321442
2025-12-04T12:33:42.6932103Z I1204 12:31:20.150000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 321443
2025-12-04T12:33:42.6932475Z I1204 12:31:20.150000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 321444
2025-12-04T12:33:42.6932997Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T12:33:42.6933385Z   return func(*args, **kwargs)
2025-12-04T12:33:42.6933496Z dist init r=0, world=4
2025-12-04T12:33:42.6933594Z dist init r=3, world=4
2025-12-04T12:33:42.6933690Z dist init r=2, world=4
2025-12-04T12:33:42.6933785Z dist init r=1, world=4
2025-12-04T12:33:42.6933881Z PASSED [8.3165s] [ 23%]
2025-12-04T12:33:42.6934301Z distributed/fsdp/test_fsdp_checkpoint.py::TestFSDPCheckpoint::test_basic_checkpoint_end_to_end_cpu_offload1_offload_activations_False_use_orig_params_False I1204 12:31:28.466000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 321774
2025-12-04T12:33:42.6934849Z I1204 12:31:28.467000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 321775
2025-12-04T12:33:42.6935186Z I1204 12:31:28.468000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 321776
2025-12-04T12:33:42.6935567Z I1204 12:31:28.468000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 321777
2025-12-04T12:33:42.6936040Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T12:33:42.6936402Z   return func(*args, **kwargs)
2025-12-04T12:33:42.6936510Z dist init r=0, world=4
2025-12-04T12:33:42.6936607Z dist init r=3, world=4
2025-12-04T12:33:42.6936703Z dist init r=2, world=4
2025-12-04T12:33:42.6936801Z dist init r=1, world=4
2025-12-04T12:33:42.6936898Z PASSED [8.4160s] [ 29%]
2025-12-04T12:33:42.6937316Z distributed/fsdp/test_fsdp_checkpoint.py::TestFSDPCheckpoint::test_basic_checkpoint_end_to_end_cpu_offload1_offload_activations_False_use_orig_params_True I1204 12:31:36.884000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 322107
2025-12-04T12:33:42.6937867Z I1204 12:31:36.885000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 322108
2025-12-04T12:33:42.6938205Z I1204 12:31:36.885000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 322109
2025-12-04T12:33:42.6938541Z I1204 12:31:36.885000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 322110
2025-12-04T12:33:42.6939014Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T12:33:42.6939392Z   return func(*args, **kwargs)
2025-12-04T12:33:42.6939501Z dist init r=0, world=4
2025-12-04T12:33:42.6939651Z dist init r=3, world=4
2025-12-04T12:33:42.6939747Z dist init r=2, world=4
2025-12-04T12:33:42.6939846Z dist init r=1, world=4
2025-12-04T12:33:42.6939960Z PASSED [8.2159s] [ 35%]
2025-12-04T12:33:42.6940382Z distributed/fsdp/test_fsdp_checkpoint.py::TestFSDPCheckpoint::test_basic_checkpoint_end_to_end_cpu_offload1_offload_activations_True_use_orig_params_False I1204 12:31:45.102000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 322440
2025-12-04T12:33:42.6940924Z I1204 12:31:45.102000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 322441
2025-12-04T12:33:42.6941262Z I1204 12:31:45.102000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 322442
2025-12-04T12:33:42.6941602Z I1204 12:31:45.103000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 322443
2025-12-04T12:33:42.6942074Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T12:33:42.6942441Z   return func(*args, **kwargs)
2025-12-04T12:33:42.6942549Z dist init r=0, world=4
2025-12-04T12:33:42.6942647Z dist init r=3, world=4
2025-12-04T12:33:42.6942745Z dist init r=1, world=4
2025-12-04T12:33:42.6942841Z dist init r=2, world=4
2025-12-04T12:33:42.6942938Z PASSED [8.1161s] [ 41%]
2025-12-04T12:33:42.6943353Z distributed/fsdp/test_fsdp_checkpoint.py::TestFSDPCheckpoint::test_basic_checkpoint_end_to_end_cpu_offload1_offload_activations_True_use_orig_params_True I1204 12:31:53.219000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 322773
2025-12-04T12:33:42.6943896Z I1204 12:31:53.220000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 322774
2025-12-04T12:33:42.6944232Z I1204 12:31:53.220000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 322775
2025-12-04T12:33:42.6944568Z I1204 12:31:53.220000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 322776
2025-12-04T12:33:42.6945089Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T12:33:42.6945454Z   return func(*args, **kwargs)
2025-12-04T12:33:42.6945559Z dist init r=0, world=4
2025-12-04T12:33:42.6945659Z dist init r=3, world=4
2025-12-04T12:33:42.6945756Z dist init r=1, world=4
2025-12-04T12:33:42.6945853Z dist init r=2, world=4
2025-12-04T12:33:42.6945950Z PASSED [8.4152s] [ 47%]
2025-12-04T12:33:42.6946369Z distributed/fsdp/test_fsdp_checkpoint.py::TestFSDPCheckpoint::test_checkpoint_fsdp_wrapping_cpu_offload0_offload_activations_False_use_orig_params_False I1204 12:32:01.636000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 323106
2025-12-04T12:33:42.6946921Z I1204 12:32:01.636000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 323107
2025-12-04T12:33:42.6947259Z I1204 12:32:01.637000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 323108
2025-12-04T12:33:42.6947602Z I1204 12:32:01.637000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 323109
2025-12-04T12:33:42.6948076Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T12:33:42.6948456Z   return func(*args, **kwargs)
2025-12-04T12:33:42.6948564Z dist init r=0, world=4
2025-12-04T12:33:42.6948661Z dist init r=3, world=4
2025-12-04T12:33:42.6948759Z dist init r=1, world=4
2025-12-04T12:33:42.6948852Z dist init r=2, world=4
2025-12-04T12:33:42.6948948Z PASSED [8.0162s] [ 52%]
2025-12-04T12:33:42.6949388Z distributed/fsdp/test_fsdp_checkpoint.py::TestFSDPCheckpoint::test_checkpoint_fsdp_wrapping_cpu_offload0_offload_activations_False_use_orig_params_True I1204 12:32:09.653000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 323439
2025-12-04T12:33:42.6949969Z I1204 12:32:09.654000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 323440
2025-12-04T12:33:42.6950308Z I1204 12:32:09.654000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 323441
2025-12-04T12:33:42.6950645Z I1204 12:32:09.655000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 323442
2025-12-04T12:33:42.6951116Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T12:33:42.6951483Z   return func(*args, **kwargs)
2025-12-04T12:33:42.6951592Z dist init r=0, world=4
2025-12-04T12:33:42.6951691Z dist init r=3, world=4
2025-12-04T12:33:42.6951789Z dist init r=1, world=4
2025-12-04T12:33:42.6951884Z dist init r=2, world=4
2025-12-04T12:33:42.6951981Z PASSED [8.2152s] [ 58%]
2025-12-04T12:33:42.6952402Z distributed/fsdp/test_fsdp_checkpoint.py::TestFSDPCheckpoint::test_checkpoint_fsdp_wrapping_cpu_offload0_offload_activations_True_use_orig_params_False I1204 12:32:17.870000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 323772
2025-12-04T12:33:42.6952949Z I1204 12:32:17.871000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 323773
2025-12-04T12:33:42.6953287Z I1204 12:32:17.871000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 323774
2025-12-04T12:33:42.6953624Z I1204 12:32:17.872000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 323775
2025-12-04T12:33:42.6954137Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T12:33:42.6954500Z   return func(*args, **kwargs)
2025-12-04T12:33:42.6954608Z dist init r=0, world=4
2025-12-04T12:33:42.6954707Z dist init r=3, world=4
2025-12-04T12:33:42.6954803Z dist init r=2, world=4
2025-12-04T12:33:42.6954897Z dist init r=1, world=4
2025-12-04T12:33:42.6954996Z PASSED [8.3157s] [ 64%]
2025-12-04T12:33:42.6955414Z distributed/fsdp/test_fsdp_checkpoint.py::TestFSDPCheckpoint::test_checkpoint_fsdp_wrapping_cpu_offload0_offload_activations_True_use_orig_params_True I1204 12:32:26.187000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 324105
2025-12-04T12:33:42.6955956Z I1204 12:32:26.188000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 324106
2025-12-04T12:33:42.6956299Z I1204 12:32:26.188000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 324107
2025-12-04T12:33:42.6956638Z I1204 12:32:26.189000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 324108
2025-12-04T12:33:42.6957111Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T12:33:42.6957490Z   return func(*args, **kwargs)
2025-12-04T12:33:42.6957600Z dist init r=0, world=4
2025-12-04T12:33:42.6957698Z dist init r=3, world=4
2025-12-04T12:33:42.6957796Z dist init r=1, world=4
2025-12-04T12:33:42.6957893Z dist init r=2, world=4
2025-12-04T12:33:42.6957991Z PASSED [8.4165s] [ 70%]
2025-12-04T12:33:42.6958431Z distributed/fsdp/test_fsdp_checkpoint.py::TestFSDPCheckpoint::test_checkpoint_fsdp_wrapping_cpu_offload1_offload_activations_False_use_orig_params_False I1204 12:32:34.606000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 324438
2025-12-04T12:33:42.6958975Z I1204 12:32:34.606000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 324439
2025-12-04T12:33:42.6959314Z I1204 12:32:34.607000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 324440
2025-12-04T12:33:42.6959702Z I1204 12:32:34.607000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 324441
2025-12-04T12:33:42.6960179Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T12:33:42.6960546Z   return func(*args, **kwargs)
2025-12-04T12:33:42.6960655Z dist init r=0, world=4
2025-12-04T12:33:42.6960758Z dist init r=3, world=4
2025-12-04T12:33:42.6960854Z dist init r=1, world=4
2025-12-04T12:33:42.6960951Z dist init r=2, world=4
2025-12-04T12:33:42.6961048Z PASSED [8.2167s] [ 76%]
2025-12-04T12:33:42.6961476Z distributed/fsdp/test_fsdp_checkpoint.py::TestFSDPCheckpoint::test_checkpoint_fsdp_wrapping_cpu_offload1_offload_activations_False_use_orig_params_True I1204 12:32:42.824000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 324771
2025-12-04T12:33:42.6962026Z I1204 12:32:42.824000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 324772
2025-12-04T12:33:42.6962366Z I1204 12:32:42.825000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 324773
2025-12-04T12:33:42.6962704Z I1204 12:32:42.825000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 324774
2025-12-04T12:33:42.6963214Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T12:33:42.6963576Z   return func(*args, **kwargs)
2025-12-04T12:33:42.6963687Z dist init r=0, world=4
2025-12-04T12:33:42.6963785Z dist init r=3, world=4
2025-12-04T12:33:42.6963881Z dist init r=1, world=4
2025-12-04T12:33:42.6963977Z dist init r=2, world=4
2025-12-04T12:33:42.6964074Z PASSED [8.4173s] [ 82%]
2025-12-04T12:33:42.6964495Z distributed/fsdp/test_fsdp_checkpoint.py::TestFSDPCheckpoint::test_checkpoint_fsdp_wrapping_cpu_offload1_offload_activations_True_use_orig_params_False I1204 12:32:51.243000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 325104
2025-12-04T12:33:42.6965040Z I1204 12:32:51.243000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 325105
2025-12-04T12:33:42.6965382Z I1204 12:32:51.244000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 325106
2025-12-04T12:33:42.6965719Z I1204 12:32:51.244000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 325107
2025-12-04T12:33:42.6966191Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T12:33:42.6966573Z   return func(*args, **kwargs)
2025-12-04T12:33:42.6966682Z dist init r=0, world=4
2025-12-04T12:33:42.6966780Z dist init r=3, world=4
2025-12-04T12:33:42.6966876Z dist init r=1, world=4
2025-12-04T12:33:42.6966973Z dist init r=2, world=4
2025-12-04T12:33:42.6967070Z PASSED [8.2160s] [ 88%]
2025-12-04T12:33:42.6967579Z distributed/fsdp/test_fsdp_checkpoint.py::TestFSDPCheckpoint::test_checkpoint_fsdp_wrapping_cpu_offload1_offload_activations_True_use_orig_params_True I1204 12:32:59.460000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 325437
2025-12-04T12:33:42.6968176Z I1204 12:32:59.461000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 325438
2025-12-04T12:33:42.6968638Z I1204 12:32:59.461000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 325439
2025-12-04T12:33:42.6969028Z I1204 12:32:59.462000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 325440
2025-12-04T12:33:42.6969544Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T12:33:42.6977796Z   return func(*args, **kwargs)
2025-12-04T12:33:42.6977938Z dist init r=0, world=4
2025-12-04T12:33:42.6978048Z dist init r=3, world=4
2025-12-04T12:33:42.6978151Z dist init r=2, world=4
2025-12-04T12:33:42.6978253Z dist init r=1, world=4
2025-12-04T12:33:42.6978366Z PASSED [8.0142s] [ 94%]
2025-12-04T12:33:42.6978787Z distributed/fsdp/test_fsdp_checkpoint.py::TestFSDPCheckpointSubmoduleCUDA::test_checkpoint_submodule_use_reentrant_False_cuda I1204 12:33:07.476000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 325770
2025-12-04T12:33:42.6979328Z I1204 12:33:07.477000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 325771
2025-12-04T12:33:42.6979729Z I1204 12:33:07.478000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 325772
2025-12-04T12:33:42.6980079Z I1204 12:33:07.478000 320373 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 325773
2025-12-04T12:33:42.6980653Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_checkpoint.py:322: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:33:42.6981058Z   model.checkpoint1 = FSDP(module=model.checkpoint1, **fsdp_kwargs)
2025-12-04T12:33:42.6981689Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T12:33:42.6982285Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T12:33:42.6982663Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_checkpoint.py:322: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:33:42.6983065Z   model.checkpoint1 = FSDP(module=model.checkpoint1, **fsdp_kwargs)
2025-12-04T12:33:42.6983677Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T12:33:42.6984289Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T12:33:42.6984666Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_checkpoint.py:323: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:33:42.6985069Z   model.checkpoint2 = FSDP(module=model.checkpoint2, **fsdp_kwargs)
2025-12-04T12:33:42.6985482Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_checkpoint.py:325: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:33:42.6985880Z   model_ac.checkpoint1 = FSDP(module=model_ac.checkpoint1, **fsdp_kwargs)
2025-12-04T12:33:42.6986282Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_checkpoint.py:326: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:33:42.6986677Z   model_ac.checkpoint2 = FSDP(module=model_ac.checkpoint2, **fsdp_kwargs)
2025-12-04T12:33:42.6987075Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_checkpoint.py:323: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:33:42.6987462Z   model.checkpoint2 = FSDP(module=model.checkpoint2, **fsdp_kwargs)
2025-12-04T12:33:42.6987857Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_checkpoint.py:325: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:33:42.6988255Z   model_ac.checkpoint1 = FSDP(module=model_ac.checkpoint1, **fsdp_kwargs)
2025-12-04T12:33:42.6988650Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_checkpoint.py:326: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:33:42.6989039Z   model_ac.checkpoint2 = FSDP(module=model_ac.checkpoint2, **fsdp_kwargs)
2025-12-04T12:33:42.6989434Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_checkpoint.py:322: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:33:42.6989868Z   model.checkpoint1 = FSDP(module=model.checkpoint1, **fsdp_kwargs)
2025-12-04T12:33:42.6990527Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T12:33:42.6991124Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T12:33:42.6991497Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_checkpoint.py:323: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:33:42.6991885Z   model.checkpoint2 = FSDP(module=model.checkpoint2, **fsdp_kwargs)
2025-12-04T12:33:42.6992275Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_checkpoint.py:325: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:33:42.6992665Z   model_ac.checkpoint1 = FSDP(module=model_ac.checkpoint1, **fsdp_kwargs)
2025-12-04T12:33:42.6993066Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_checkpoint.py:326: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:33:42.6993459Z   model_ac.checkpoint2 = FSDP(module=model_ac.checkpoint2, **fsdp_kwargs)
2025-12-04T12:33:42.6993857Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_checkpoint.py:322: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:33:42.6994275Z   model.checkpoint1 = FSDP(module=model.checkpoint1, **fsdp_kwargs)
2025-12-04T12:33:42.6994892Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T12:33:42.6995500Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T12:33:42.6995870Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_checkpoint.py:323: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:33:42.6996257Z   model.checkpoint2 = FSDP(module=model.checkpoint2, **fsdp_kwargs)
2025-12-04T12:33:42.6996649Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_checkpoint.py:325: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:33:42.6997044Z   model_ac.checkpoint1 = FSDP(module=model_ac.checkpoint1, **fsdp_kwargs)
2025-12-04T12:33:42.6997440Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_checkpoint.py:326: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:33:42.6997832Z   model_ac.checkpoint2 = FSDP(module=model_ac.checkpoint2, **fsdp_kwargs)
2025-12-04T12:33:42.6999268Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:33:42.7000731Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:33:42.7002159Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:33:42.7003572Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:33:42.7004995Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:33:42.7006432Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:33:42.7007845Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:33:42.7009253Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:33:42.7009565Z [rank2]:E1204 12:33:14.612000 325772 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:33:42.7009960Z [rank2]:E1204 12:33:14.612000 325772 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:33:42.7010494Z [rank2]:E1204 12:33:14.612000 325772 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:33:42.7010984Z [rank2]:E1204 12:33:14.612000 325772 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:33:42.7011473Z [rank2]:E1204 12:33:14.612000 325772 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:33:42.7011928Z [rank2]:E1204 12:33:14.612000 325772 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:33:42.7012376Z [rank2]:E1204 12:33:14.612000 325772 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:42.7012852Z [rank2]:E1204 12:33:14.612000 325772 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:33:42.7013326Z [rank2]:E1204 12:33:14.612000 325772 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:42.7013797Z [rank2]:E1204 12:33:14.612000 325772 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:33:42.7014290Z [rank2]:E1204 12:33:14.612000 325772 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:33:42.7014754Z [rank2]:E1204 12:33:14.612000 325772 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:33:42.7015235Z [rank2]:E1204 12:33:14.612000 325772 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:33:42.7015705Z [rank2]:E1204 12:33:14.612000 325772 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:33:42.7016407Z [rank2]:E1204 12:33:14.612000 325772 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPCheckpointSubmoduleCUDA.test_checkpoint_submodule_use_reentrant_False_cuda! Caching allocator allocated memory was 512 and is now reported as 3236352 on device 2. CUDA driver allocated memory was 2300575744 and is now 3789553664.
2025-12-04T12:33:42.7017073Z [rank2]:E1204 12:33:14.612000 325772 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:33:42.7017437Z [rank2]:E1204 12:33:14.612000 325772 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:33:42.7018067Z [rank2]:E1204 12:33:14.612000 325772 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_checkpoint.py TestFSDPCheckpointSubmoduleCUDA.test_checkpoint_submodule_use_reentrant_False_cuda
2025-12-04T12:33:42.7018613Z [rank2]:E1204 12:33:14.612000 325772 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:33:42.7018985Z [rank2]:E1204 12:33:14.612000 325772 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:33:42.7019405Z [rank2]:E1204 12:33:14.612000 325772 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T12:33:42.7019686Z dist init r=2, world=4
2025-12-04T12:33:42.7019928Z [rank3]:E1204 12:33:14.629000 325773 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:33:42.7020275Z [rank3]:E1204 12:33:14.629000 325773 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:33:42.7020767Z [rank3]:E1204 12:33:14.629000 325773 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:33:42.7021256Z [rank3]:E1204 12:33:14.629000 325773 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:33:42.7021742Z [rank3]:E1204 12:33:14.629000 325773 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:33:42.7022195Z [rank3]:E1204 12:33:14.629000 325773 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:33:42.7022641Z [rank3]:E1204 12:33:14.629000 325773 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:42.7023111Z [rank3]:E1204 12:33:14.629000 325773 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:33:42.7023596Z [rank3]:E1204 12:33:14.629000 325773 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:42.7024061Z [rank3]:E1204 12:33:14.629000 325773 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:33:42.7024547Z [rank3]:E1204 12:33:14.629000 325773 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:33:42.7025001Z [rank3]:E1204 12:33:14.629000 325773 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:33:42.7025461Z [rank3]:E1204 12:33:14.629000 325773 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:33:42.7025934Z [rank3]:E1204 12:33:14.629000 325773 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:33:42.7026629Z [rank3]:E1204 12:33:14.629000 325773 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPCheckpointSubmoduleCUDA.test_checkpoint_submodule_use_reentrant_False_cuda! Caching allocator allocated memory was 512 and is now reported as 3236352 on device 3. CUDA driver allocated memory was 2250244096 and is now 3739222016.
2025-12-04T12:33:42.7027283Z [rank3]:E1204 12:33:14.629000 325773 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:33:42.7027643Z [rank3]:E1204 12:33:14.629000 325773 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:33:42.7028268Z [rank3]:E1204 12:33:14.629000 325773 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_checkpoint.py TestFSDPCheckpointSubmoduleCUDA.test_checkpoint_submodule_use_reentrant_False_cuda
2025-12-04T12:33:42.7028807Z [rank3]:E1204 12:33:14.629000 325773 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:33:42.7029201Z [rank3]:E1204 12:33:14.629000 325773 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:33:42.7029660Z [rank3]:E1204 12:33:14.629000 325773 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T12:33:42.7029910Z dist init r=3, world=4
2025-12-04T12:33:42.7030119Z [rank1]:E1204 12:33:14.646000 325771 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:33:42.7030464Z [rank1]:E1204 12:33:14.646000 325771 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:33:42.7030963Z [rank1]:E1204 12:33:14.646000 325771 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:33:42.7031451Z [rank1]:E1204 12:33:14.646000 325771 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:33:42.7031937Z [rank1]:E1204 12:33:14.646000 325771 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:33:42.7032394Z [rank1]:E1204 12:33:14.646000 325771 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:33:42.7032838Z [rank1]:E1204 12:33:14.646000 325771 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:42.7033324Z [rank1]:E1204 12:33:14.646000 325771 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:33:42.7033789Z [rank1]:E1204 12:33:14.646000 325771 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:42.7034290Z [rank1]:E1204 12:33:14.646000 325771 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:33:42.7034751Z [rank1]:E1204 12:33:14.646000 325771 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:33:42.7035201Z [rank1]:E1204 12:33:14.646000 325771 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:33:42.7035658Z [rank1]:E1204 12:33:14.646000 325771 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:33:42.7036129Z [rank1]:E1204 12:33:14.646000 325771 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:33:42.7036820Z [rank1]:E1204 12:33:14.646000 325771 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPCheckpointSubmoduleCUDA.test_checkpoint_submodule_use_reentrant_False_cuda! Caching allocator allocated memory was 512 and is now reported as 3236352 on device 1. CUDA driver allocated memory was 2317352960 and is now 3806330880.
2025-12-04T12:33:42.7037468Z [rank1]:E1204 12:33:14.646000 325771 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:33:42.7037818Z [rank1]:E1204 12:33:14.646000 325771 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:33:42.7038472Z [rank1]:E1204 12:33:14.646000 325771 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_checkpoint.py TestFSDPCheckpointSubmoduleCUDA.test_checkpoint_submodule_use_reentrant_False_cuda
2025-12-04T12:33:42.7039007Z [rank1]:E1204 12:33:14.646000 325771 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:33:42.7039370Z [rank1]:E1204 12:33:14.646000 325771 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:33:42.7039824Z [rank1]:E1204 12:33:14.646000 325771 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T12:33:42.7040068Z dist init r=1, world=4
2025-12-04T12:33:42.7040268Z [rank0]:E1204 12:33:14.684000 325770 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:33:42.7040604Z [rank0]:E1204 12:33:14.684000 325770 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:33:42.7041095Z [rank0]:E1204 12:33:14.684000 325770 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:33:42.7041575Z [rank0]:E1204 12:33:14.684000 325770 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:33:42.7042052Z [rank0]:E1204 12:33:14.684000 325770 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:33:42.7042518Z [rank0]:E1204 12:33:14.684000 325770 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:33:42.7042955Z [rank0]:E1204 12:33:14.684000 325770 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:42.7043433Z [rank0]:E1204 12:33:14.684000 325770 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:33:42.7043898Z [rank0]:E1204 12:33:14.684000 325770 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:42.7044360Z [rank0]:E1204 12:33:14.684000 325770 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:33:42.7044824Z [rank0]:E1204 12:33:14.684000 325770 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:33:42.7045277Z [rank0]:E1204 12:33:14.684000 325770 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:33:42.7045744Z [rank0]:E1204 12:33:14.684000 325770 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:33:42.7046215Z [rank0]:E1204 12:33:14.684000 325770 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:33:42.7046904Z [rank0]:E1204 12:33:14.684000 325770 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPCheckpointSubmoduleCUDA.test_checkpoint_submodule_use_reentrant_False_cuda! Caching allocator allocated memory was 512 and is now reported as 3236352 on device 0. CUDA driver allocated memory was 2459959296 and is now 3948937216.
2025-12-04T12:33:42.7047568Z [rank0]:E1204 12:33:14.684000 325770 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:33:42.7047955Z [rank0]:E1204 12:33:14.684000 325770 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:33:42.7048578Z [rank0]:E1204 12:33:14.684000 325770 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_checkpoint.py TestFSDPCheckpointSubmoduleCUDA.test_checkpoint_submodule_use_reentrant_False_cuda
2025-12-04T12:33:42.7049111Z [rank0]:E1204 12:33:14.684000 325770 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:33:42.7049473Z [rank0]:E1204 12:33:14.684000 325770 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:33:42.7049932Z [rank0]:E1204 12:33:14.684000 325770 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T12:33:42.7050173Z dist init r=0, world=4
2025-12-04T12:33:42.7050602Z [rank0]:[W1204 12:33:14.946474981 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T12:33:42.7051012Z FAILED [9.0166s] [100%]
2025-12-04T12:33:42.7051080Z 
2025-12-04T12:33:42.7051139Z =================================== FAILURES ===================================
2025-12-04T12:33:42.7051355Z _ TestFSDPCheckpointSubmoduleCUDA.test_checkpoint_submodule_use_reentrant_False_cuda _
2025-12-04T12:33:42.7051577Z Traceback (most recent call last):
2025-12-04T12:33:42.7051829Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T12:33:42.7052077Z     self._join_processes(fn)
2025-12-04T12:33:42.7052327Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T12:33:42.7052610Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T12:33:42.7052884Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T12:33:42.7053148Z     raise RuntimeError(error)
2025-12-04T12:33:42.7053304Z RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T12:33:42.7053467Z Traceback (most recent call last):
2025-12-04T12:33:42.7053715Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:33:42.7053959Z     getattr(self, test_name)()
2025-12-04T12:33:42.7054195Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:33:42.7054430Z     fn()
2025-12-04T12:33:42.7054638Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:42.7054873Z     method(*args, **kwargs)
2025-12-04T12:33:42.7055100Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:42.7055333Z     method(*args, **kwargs)
2025-12-04T12:33:42.7055549Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:33:42.7055771Z     with policy():
2025-12-04T12:33:42.7055980Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:33:42.7056208Z     raise RuntimeError(msg)
2025-12-04T12:33:42.7056649Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPCheckpointSubmoduleCUDA.test_checkpoint_submodule_use_reentrant_False_cuda! Caching allocator allocated memory was 512 and is now reported as 3236352 on device 2. CUDA driver allocated memory was 2300575744 and is now 3789553664.
2025-12-04T12:33:42.7057092Z 
2025-12-04T12:33:42.7057167Z To execute this test, run the following from the base repo dir:
2025-12-04T12:33:42.7057539Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_checkpoint.py TestFSDPCheckpointSubmoduleCUDA.test_checkpoint_submodule_use_reentrant_False_cuda
2025-12-04T12:33:42.7057833Z 
2025-12-04T12:33:42.7057923Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:33:42.7058047Z 
2025-12-04T12:33:42.7058105Z Process 3 exited with error code 10 and exception:
2025-12-04T12:33:42.7058241Z Traceback (most recent call last):
2025-12-04T12:33:42.7058480Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:33:42.7058719Z     getattr(self, test_name)()
2025-12-04T12:33:42.7058955Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:33:42.7059185Z     fn()
2025-12-04T12:33:42.7059383Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:42.7059640Z     method(*args, **kwargs)
2025-12-04T12:33:42.7059856Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:42.7060082Z     method(*args, **kwargs)
2025-12-04T12:33:42.7060317Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:33:42.7060539Z     with policy():
2025-12-04T12:33:42.7060746Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:33:42.7060974Z     raise RuntimeError(msg)
2025-12-04T12:33:42.7061432Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPCheckpointSubmoduleCUDA.test_checkpoint_submodule_use_reentrant_False_cuda! Caching allocator allocated memory was 512 and is now reported as 3236352 on device 3. CUDA driver allocated memory was 2250244096 and is now 3739222016.
2025-12-04T12:33:42.7061838Z 
2025-12-04T12:33:42.7061912Z To execute this test, run the following from the base repo dir:
2025-12-04T12:33:42.7062278Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_checkpoint.py TestFSDPCheckpointSubmoduleCUDA.test_checkpoint_submodule_use_reentrant_False_cuda
2025-12-04T12:33:42.7062574Z 
2025-12-04T12:33:42.7062663Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:33:42.7062785Z 
2025-12-04T12:33:42.7062786Z 
2025-12-04T12:33:42.7062866Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:33:42.7063066Z Process 2 terminated with exit code 10, terminating remaining processes.
2025-12-04T12:33:42.7063445Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_checkpoint/distributed.fsdp.test_fsdp_checkpoint-ef762e98490b0b73.xml -
2025-12-04T12:33:42.7063789Z =========================== short test summary info ============================
2025-12-04T12:33:42.7064166Z FAILED [9.0166s] distributed/fsdp/test_fsdp_checkpoint.py::TestFSDPCheckpointSubmoduleCUDA::test_checkpoint_submodule_use_reentrant_False_cuda - RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T12:33:42.7064524Z Traceback (most recent call last):
2025-12-04T12:33:42.7064766Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:33:42.7065006Z     getattr(self, test_name)()
2025-12-04T12:33:42.7065237Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:33:42.7065467Z     fn()
2025-12-04T12:33:42.7065701Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:42.7065927Z     method(*args, **kwargs)
2025-12-04T12:33:42.7066144Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:42.7066369Z     method(*args, **kwargs)
2025-12-04T12:33:42.7066585Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:33:42.7066811Z     with policy():
2025-12-04T12:33:42.7067018Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:33:42.7067246Z     raise RuntimeError(msg)
2025-12-04T12:33:42.7067689Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPCheckpointSubmoduleCUDA.test_checkpoint_submodule_use_reentrant_False_cuda! Caching allocator allocated memory was 512 and is now reported as 3236352 on device 2. CUDA driver allocated memory was 2300575744 and is now 3789553664.
2025-12-04T12:33:42.7068100Z 
2025-12-04T12:33:42.7068172Z To execute this test, run the following from the base repo dir:
2025-12-04T12:33:42.7068540Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_checkpoint.py TestFSDPCheckpointSubmoduleCUDA.test_checkpoint_submodule_use_reentrant_False_cuda
2025-12-04T12:33:42.7068832Z 
2025-12-04T12:33:42.7068932Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:33:42.7069057Z 
2025-12-04T12:33:42.7069113Z Process 3 exited with error code 10 and exception:
2025-12-04T12:33:42.7069249Z Traceback (most recent call last):
2025-12-04T12:33:42.7069490Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:33:42.7069785Z     getattr(self, test_name)()
2025-12-04T12:33:42.7070018Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:33:42.7070246Z     fn()
2025-12-04T12:33:42.7070443Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:42.7070668Z     method(*args, **kwargs)
2025-12-04T12:33:42.7070885Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:42.7071111Z     method(*args, **kwargs)
2025-12-04T12:33:42.7071325Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:33:42.7071548Z     with policy():
2025-12-04T12:33:42.7071754Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:33:42.7071984Z     raise RuntimeError(msg)
2025-12-04T12:33:42.7072426Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPCheckpointSubmoduleCUDA.test_checkpoint_submodule_use_reentrant_False_cuda! Caching allocator allocated memory was 512 and is now reported as 3236352 on device 3. CUDA driver allocated memory was 2250244096 and is now 3739222016.
2025-12-04T12:33:42.7072834Z 
2025-12-04T12:33:42.7072907Z To execute this test, run the following from the base repo dir:
2025-12-04T12:33:42.7073279Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_checkpoint.py TestFSDPCheckpointSubmoduleCUDA.test_checkpoint_submodule_use_reentrant_False_cuda
2025-12-04T12:33:42.7073571Z 
2025-12-04T12:33:42.7073659Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:33:42.7073844Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:33:42.7074012Z =================== 1 failed, 16 passed in 141.51s (0:02:21) ===================
2025-12-04T12:33:42.7074181Z Got exit code 1
2025-12-04T12:33:42.7074274Z Retrying single test...
2025-12-04T12:33:42.7074549Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_checkpoint/distributed.fsdp.test_fsdp_checkpoint-d1aea10b4f99f99c.xml
2025-12-04T12:33:42.7074854Z ============================= test session starts ==============================
2025-12-04T12:33:42.7075063Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:33:42.7075250Z cachedir: .pytest_cache
2025-12-04T12:33:42.7075473Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:33:42.7075708Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T12:33:42.7075823Z configfile: pytest.ini
2025-12-04T12:33:42.7076046Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T12:33:42.7076589Z collecting ... /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_checkpoint.py:292: PytestCollectionWarning: cannot collect test class 'TestModel' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_checkpoint.py)
2025-12-04T12:33:42.7076993Z   class TestModel(nn.Module):
2025-12-04T12:33:42.7077116Z collected 17 items / 16 deselected / 1 selected
2025-12-04T12:33:42.7077453Z stepcurrent: skipping 16 already run items. Running only test/distributed/fsdp/test_fsdp_checkpoint.py::TestFSDPCheckpointSubmoduleCUDA::test_checkpoint_submodule_use_reentrant_False_cuda
2025-12-04T12:33:42.7077794Z Running 1 items in this shard
2025-12-04T12:33:42.7077864Z 
2025-12-04T12:33:42.7078202Z distributed/fsdp/test_fsdp_checkpoint.py::TestFSDPCheckpointSubmoduleCUDA::test_checkpoint_submodule_use_reentrant_False_cuda I1204 12:33:19.069000 326103 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 326172
2025-12-04T12:33:42.7078743Z I1204 12:33:19.070000 326103 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 326173
2025-12-04T12:33:42.7079082Z I1204 12:33:19.070000 326103 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 326174
2025-12-04T12:33:42.7079419Z I1204 12:33:19.071000 326103 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 326175
2025-12-04T12:33:42.7079921Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_checkpoint.py:322: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:33:42.7080311Z   model.checkpoint1 = FSDP(module=model.checkpoint1, **fsdp_kwargs)
2025-12-04T12:33:42.7080936Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T12:33:42.7081519Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T12:33:42.7081882Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_checkpoint.py:323: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:33:42.7082263Z   model.checkpoint2 = FSDP(module=model.checkpoint2, **fsdp_kwargs)
2025-12-04T12:33:42.7082649Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_checkpoint.py:325: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:33:42.7083033Z   model_ac.checkpoint1 = FSDP(module=model_ac.checkpoint1, **fsdp_kwargs)
2025-12-04T12:33:42.7083458Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_checkpoint.py:326: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:33:42.7083841Z   model_ac.checkpoint2 = FSDP(module=model_ac.checkpoint2, **fsdp_kwargs)
2025-12-04T12:33:42.7084232Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_checkpoint.py:322: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:33:42.7084609Z   model.checkpoint1 = FSDP(module=model.checkpoint1, **fsdp_kwargs)
2025-12-04T12:33:42.7085397Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T12:33:42.7085986Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T12:33:42.7086351Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_checkpoint.py:323: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:33:42.7086729Z   model.checkpoint2 = FSDP(module=model.checkpoint2, **fsdp_kwargs)
2025-12-04T12:33:42.7087116Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_checkpoint.py:325: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:33:42.7087517Z   model_ac.checkpoint1 = FSDP(module=model_ac.checkpoint1, **fsdp_kwargs)
2025-12-04T12:33:42.7087902Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_checkpoint.py:326: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:33:42.7088299Z   model_ac.checkpoint2 = FSDP(module=model_ac.checkpoint2, **fsdp_kwargs)
2025-12-04T12:33:42.7088686Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_checkpoint.py:322: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:33:42.7089063Z   model.checkpoint1 = FSDP(module=model.checkpoint1, **fsdp_kwargs)
2025-12-04T12:33:42.7089713Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T12:33:42.7090303Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T12:33:42.7090674Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_checkpoint.py:323: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:33:42.7091053Z   model.checkpoint2 = FSDP(module=model.checkpoint2, **fsdp_kwargs)
2025-12-04T12:33:42.7091432Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_checkpoint.py:325: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:33:42.7091811Z   model_ac.checkpoint1 = FSDP(module=model_ac.checkpoint1, **fsdp_kwargs)
2025-12-04T12:33:42.7092199Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_checkpoint.py:326: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:33:42.7092579Z   model_ac.checkpoint2 = FSDP(module=model_ac.checkpoint2, **fsdp_kwargs)
2025-12-04T12:33:42.7092967Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_checkpoint.py:322: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:33:42.7093381Z   model.checkpoint1 = FSDP(module=model.checkpoint1, **fsdp_kwargs)
2025-12-04T12:33:42.7093981Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T12:33:42.7094560Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T12:33:42.7094922Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_checkpoint.py:323: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:33:42.7095298Z   model.checkpoint2 = FSDP(module=model.checkpoint2, **fsdp_kwargs)
2025-12-04T12:33:42.7095677Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_checkpoint.py:325: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:33:42.7096058Z   model_ac.checkpoint1 = FSDP(module=model_ac.checkpoint1, **fsdp_kwargs)
2025-12-04T12:33:42.7096442Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_checkpoint.py:326: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:33:42.7096841Z   model_ac.checkpoint2 = FSDP(module=model_ac.checkpoint2, **fsdp_kwargs)
2025-12-04T12:33:42.7098224Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:33:42.7099700Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:33:42.7101112Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:33:42.7102510Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:33:42.7103941Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:33:42.7105334Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:33:42.7106738Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:33:42.7108156Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:33:42.7108455Z [rank1]:E1204 12:33:26.171000 326173 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:33:42.7108792Z [rank1]:E1204 12:33:26.171000 326173 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:33:42.7109278Z [rank1]:E1204 12:33:26.171000 326173 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:33:42.7109793Z [rank1]:E1204 12:33:26.171000 326173 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:33:42.7110272Z [rank1]:E1204 12:33:26.171000 326173 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:33:42.7110718Z [rank1]:E1204 12:33:26.171000 326173 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:33:42.7111155Z [rank1]:E1204 12:33:26.171000 326173 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:42.7111618Z [rank1]:E1204 12:33:26.171000 326173 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:33:42.7112081Z [rank1]:E1204 12:33:26.171000 326173 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:42.7112578Z [rank1]:E1204 12:33:26.171000 326173 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:33:42.7113038Z [rank1]:E1204 12:33:26.171000 326173 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:33:42.7113486Z [rank1]:E1204 12:33:26.171000 326173 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:33:42.7113937Z [rank1]:E1204 12:33:26.171000 326173 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:33:42.7114399Z [rank1]:E1204 12:33:26.171000 326173 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:33:42.7115092Z [rank1]:E1204 12:33:26.171000 326173 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPCheckpointSubmoduleCUDA.test_checkpoint_submodule_use_reentrant_False_cuda! Caching allocator allocated memory was 512 and is now reported as 3236352 on device 1. CUDA driver allocated memory was 2317352960 and is now 3806330880.
2025-12-04T12:33:42.7115735Z [rank1]:E1204 12:33:26.171000 326173 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:33:42.7116081Z [rank1]:E1204 12:33:26.171000 326173 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:33:42.7116711Z [rank1]:E1204 12:33:26.171000 326173 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_checkpoint.py TestFSDPCheckpointSubmoduleCUDA.test_checkpoint_submodule_use_reentrant_False_cuda
2025-12-04T12:33:42.7117267Z [rank1]:E1204 12:33:26.171000 326173 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:33:42.7117629Z [rank1]:E1204 12:33:26.171000 326173 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:33:42.7118036Z [rank1]:E1204 12:33:26.171000 326173 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T12:33:42.7118276Z dist init r=1, world=4
2025-12-04T12:33:42.7118475Z [rank3]:E1204 12:33:26.193000 326175 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:33:42.7118808Z [rank3]:E1204 12:33:26.193000 326175 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:33:42.7119288Z [rank3]:E1204 12:33:26.193000 326175 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:33:42.7119858Z [rank3]:E1204 12:33:26.193000 326175 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:33:42.7120328Z [rank3]:E1204 12:33:26.193000 326175 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:33:42.7120770Z [rank3]:E1204 12:33:26.193000 326175 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:33:42.7121208Z [rank3]:E1204 12:33:26.193000 326175 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:42.7121666Z [rank3]:E1204 12:33:26.193000 326175 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:33:42.7122156Z [rank3]:E1204 12:33:26.193000 326175 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:42.7122620Z [rank3]:E1204 12:33:26.193000 326175 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:33:42.7123082Z [rank3]:E1204 12:33:26.193000 326175 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:33:42.7123534Z [rank3]:E1204 12:33:26.193000 326175 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:33:42.7123997Z [rank3]:E1204 12:33:26.193000 326175 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:33:42.7124464Z [rank3]:E1204 12:33:26.193000 326175 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:33:42.7125154Z [rank3]:E1204 12:33:26.193000 326175 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPCheckpointSubmoduleCUDA.test_checkpoint_submodule_use_reentrant_False_cuda! Caching allocator allocated memory was 512 and is now reported as 3236352 on device 3. CUDA driver allocated memory was 2250244096 and is now 3739222016.
2025-12-04T12:33:42.7125818Z [rank3]:E1204 12:33:26.193000 326175 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:33:42.7126167Z [rank3]:E1204 12:33:26.193000 326175 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:33:42.7126803Z [rank3]:E1204 12:33:26.193000 326175 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_checkpoint.py TestFSDPCheckpointSubmoduleCUDA.test_checkpoint_submodule_use_reentrant_False_cuda
2025-12-04T12:33:42.7127334Z [rank3]:E1204 12:33:26.193000 326175 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:33:42.7127700Z [rank3]:E1204 12:33:26.193000 326175 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:33:42.7128115Z [rank3]:E1204 12:33:26.193000 326175 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T12:33:42.7128356Z dist init r=3, world=4
2025-12-04T12:33:42.7128560Z [rank2]:E1204 12:33:26.239000 326174 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:33:42.7128898Z [rank2]:E1204 12:33:26.239000 326174 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:33:42.7129386Z [rank2]:E1204 12:33:26.239000 326174 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:33:42.7129903Z [rank2]:E1204 12:33:26.239000 326174 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:33:42.7130385Z [rank2]:E1204 12:33:26.239000 326174 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:33:42.7130840Z [rank2]:E1204 12:33:26.239000 326174 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:33:42.7131309Z [rank2]:E1204 12:33:26.239000 326174 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:42.7131775Z [rank2]:E1204 12:33:26.239000 326174 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:33:42.7132238Z [rank2]:E1204 12:33:26.239000 326174 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:42.7132703Z [rank2]:E1204 12:33:26.239000 326174 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:33:42.7133168Z [rank2]:E1204 12:33:26.239000 326174 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:33:42.7133625Z [rank2]:E1204 12:33:26.239000 326174 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:33:42.7134078Z [rank2]:E1204 12:33:26.239000 326174 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:33:42.7134542Z [rank2]:E1204 12:33:26.239000 326174 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:33:42.7135248Z [rank2]:E1204 12:33:26.239000 326174 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPCheckpointSubmoduleCUDA.test_checkpoint_submodule_use_reentrant_False_cuda! Caching allocator allocated memory was 512 and is now reported as 3236352 on device 2. CUDA driver allocated memory was 2300575744 and is now 3789553664.
2025-12-04T12:33:42.7136169Z [rank2]:E1204 12:33:26.239000 326174 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:33:42.7136518Z [rank2]:E1204 12:33:26.239000 326174 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:33:42.7137141Z [rank2]:E1204 12:33:26.239000 326174 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_checkpoint.py TestFSDPCheckpointSubmoduleCUDA.test_checkpoint_submodule_use_reentrant_False_cuda
2025-12-04T12:33:42.7137679Z [rank2]:E1204 12:33:26.239000 326174 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:33:42.7138045Z [rank2]:E1204 12:33:26.239000 326174 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:33:42.7138462Z [rank2]:E1204 12:33:26.239000 326174 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T12:33:42.7138704Z dist init r=2, world=4
2025-12-04T12:33:42.7138907Z [rank0]:E1204 12:33:26.269000 326172 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:33:42.7139245Z [rank0]:E1204 12:33:26.269000 326172 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:33:42.7139769Z [rank0]:E1204 12:33:26.269000 326172 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:33:42.7140250Z [rank0]:E1204 12:33:26.269000 326172 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:33:42.7140763Z [rank0]:E1204 12:33:26.269000 326172 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:33:42.7141212Z [rank0]:E1204 12:33:26.269000 326172 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:33:42.7141654Z [rank0]:E1204 12:33:26.269000 326172 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:42.7142119Z [rank0]:E1204 12:33:26.269000 326172 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:33:42.7142585Z [rank0]:E1204 12:33:26.269000 326172 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:42.7143053Z [rank0]:E1204 12:33:26.269000 326172 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:33:42.7143517Z [rank0]:E1204 12:33:26.269000 326172 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:33:42.7143974Z [rank0]:E1204 12:33:26.269000 326172 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:33:42.7144432Z [rank0]:E1204 12:33:26.269000 326172 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:33:42.7144912Z [rank0]:E1204 12:33:26.269000 326172 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:33:42.7145617Z [rank0]:E1204 12:33:26.269000 326172 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPCheckpointSubmoduleCUDA.test_checkpoint_submodule_use_reentrant_False_cuda! Caching allocator allocated memory was 512 and is now reported as 3236352 on device 0. CUDA driver allocated memory was 2459959296 and is now 3948937216.
2025-12-04T12:33:42.7146267Z [rank0]:E1204 12:33:26.269000 326172 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:33:42.7146618Z [rank0]:E1204 12:33:26.269000 326172 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:33:42.7147238Z [rank0]:E1204 12:33:26.269000 326172 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_checkpoint.py TestFSDPCheckpointSubmoduleCUDA.test_checkpoint_submodule_use_reentrant_False_cuda
2025-12-04T12:33:42.7147776Z [rank0]:E1204 12:33:26.269000 326172 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:33:42.7148141Z [rank0]:E1204 12:33:26.269000 326172 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:33:42.7148556Z [rank0]:E1204 12:33:26.269000 326172 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T12:33:42.7148798Z dist init r=0, world=4
2025-12-04T12:33:42.7149204Z [rank0]:[W1204 12:33:26.515633729 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T12:33:42.7149666Z FAILED [9.0192s] [100%]
2025-12-04T12:33:42.7149734Z 
2025-12-04T12:33:42.7149793Z =================================== FAILURES ===================================
2025-12-04T12:33:42.7150034Z _ TestFSDPCheckpointSubmoduleCUDA.test_checkpoint_submodule_use_reentrant_False_cuda _
2025-12-04T12:33:42.7150237Z Traceback (most recent call last):
2025-12-04T12:33:42.7150487Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T12:33:42.7150739Z     self._join_processes(fn)
2025-12-04T12:33:42.7150988Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T12:33:42.7151256Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T12:33:42.7151525Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T12:33:42.7151791Z     raise RuntimeError(error)
2025-12-04T12:33:42.7151944Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T12:33:42.7152107Z Traceback (most recent call last):
2025-12-04T12:33:42.7152354Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:33:42.7152596Z     getattr(self, test_name)()
2025-12-04T12:33:42.7152831Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:33:42.7153067Z     fn()
2025-12-04T12:33:42.7153271Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:42.7153529Z     method(*args, **kwargs)
2025-12-04T12:33:42.7153757Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:42.7153991Z     method(*args, **kwargs)
2025-12-04T12:33:42.7154212Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:33:42.7154457Z     with policy():
2025-12-04T12:33:42.7154675Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:33:42.7154912Z     raise RuntimeError(msg)
2025-12-04T12:33:42.7155361Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPCheckpointSubmoduleCUDA.test_checkpoint_submodule_use_reentrant_False_cuda! Caching allocator allocated memory was 512 and is now reported as 3236352 on device 1. CUDA driver allocated memory was 2317352960 and is now 3806330880.
2025-12-04T12:33:42.7155778Z 
2025-12-04T12:33:42.7155859Z To execute this test, run the following from the base repo dir:
2025-12-04T12:33:42.7156233Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_checkpoint.py TestFSDPCheckpointSubmoduleCUDA.test_checkpoint_submodule_use_reentrant_False_cuda
2025-12-04T12:33:42.7156527Z 
2025-12-04T12:33:42.7156623Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:33:42.7156747Z 
2025-12-04T12:33:42.7156751Z 
2025-12-04T12:33:42.7156833Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:33:42.7157036Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T12:33:42.7157417Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_checkpoint/distributed.fsdp.test_fsdp_checkpoint-d1aea10b4f99f99c.xml -
2025-12-04T12:33:42.7157767Z =========================== short test summary info ============================
2025-12-04T12:33:42.7158147Z FAILED [9.0192s] distributed/fsdp/test_fsdp_checkpoint.py::TestFSDPCheckpointSubmoduleCUDA::test_checkpoint_submodule_use_reentrant_False_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T12:33:42.7158504Z Traceback (most recent call last):
2025-12-04T12:33:42.7158751Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:33:42.7159021Z     getattr(self, test_name)()
2025-12-04T12:33:42.7159259Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:33:42.7159498Z     fn()
2025-12-04T12:33:42.7159734Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:42.7159968Z     method(*args, **kwargs)
2025-12-04T12:33:42.7160191Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:42.7160425Z     method(*args, **kwargs)
2025-12-04T12:33:42.7160648Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:33:42.7160877Z     with policy():
2025-12-04T12:33:42.7161093Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:33:42.7161331Z     raise RuntimeError(msg)
2025-12-04T12:33:42.7161775Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPCheckpointSubmoduleCUDA.test_checkpoint_submodule_use_reentrant_False_cuda! Caching allocator allocated memory was 512 and is now reported as 3236352 on device 1. CUDA driver allocated memory was 2317352960 and is now 3806330880.
2025-12-04T12:33:42.7162183Z 
2025-12-04T12:33:42.7162258Z To execute this test, run the following from the base repo dir:
2025-12-04T12:33:42.7162650Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_checkpoint.py TestFSDPCheckpointSubmoduleCUDA.test_checkpoint_submodule_use_reentrant_False_cuda
2025-12-04T12:33:42.7162947Z 
2025-12-04T12:33:42.7163036Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:33:42.7163244Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:33:42.7163412Z ======================= 1 failed, 16 deselected in 9.03s =======================
2025-12-04T12:33:42.7163552Z Got exit code 1
2025-12-04T12:33:42.7163650Z Retrying single test...
2025-12-04T12:33:42.7163928Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_checkpoint/distributed.fsdp.test_fsdp_checkpoint-0e6d6998b6f1fe1a.xml
2025-12-04T12:33:42.7164233Z ============================= test session starts ==============================
2025-12-04T12:33:42.7164445Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:33:42.7164639Z cachedir: .pytest_cache
2025-12-04T12:33:42.7164867Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:33:42.7165109Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T12:33:42.7165231Z configfile: pytest.ini
2025-12-04T12:33:42.7165462Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T12:33:42.7166002Z collecting ... /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_checkpoint.py:292: PytestCollectionWarning: cannot collect test class 'TestModel' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_checkpoint.py)
2025-12-04T12:33:42.7166413Z   class TestModel(nn.Module):
2025-12-04T12:33:42.7166542Z collected 17 items / 16 deselected / 1 selected
2025-12-04T12:33:42.7166888Z stepcurrent: skipping 16 already run items. Running only test/distributed/fsdp/test_fsdp_checkpoint.py::TestFSDPCheckpointSubmoduleCUDA::test_checkpoint_submodule_use_reentrant_False_cuda
2025-12-04T12:33:42.7167221Z Running 1 items in this shard
2025-12-04T12:33:42.7167297Z 
2025-12-04T12:33:42.7167670Z distributed/fsdp/test_fsdp_checkpoint.py::TestFSDPCheckpointSubmoduleCUDA::test_checkpoint_submodule_use_reentrant_False_cuda I1204 12:33:30.614000 326505 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 326574
2025-12-04T12:33:42.7168210Z I1204 12:33:30.615000 326505 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 326575
2025-12-04T12:33:42.7168554Z I1204 12:33:30.616000 326505 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 326576
2025-12-04T12:33:42.7168896Z I1204 12:33:30.616000 326505 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 326577
2025-12-04T12:33:42.7169360Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_checkpoint.py:322: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:33:42.7169784Z   model.checkpoint1 = FSDP(module=model.checkpoint1, **fsdp_kwargs)
2025-12-04T12:33:42.7170413Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T12:33:42.7171003Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T12:33:42.7171377Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_checkpoint.py:322: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:33:42.7171780Z   model.checkpoint1 = FSDP(module=model.checkpoint1, **fsdp_kwargs)
2025-12-04T12:33:42.7172387Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T12:33:42.7172984Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T12:33:42.7173353Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_checkpoint.py:323: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:33:42.7173741Z   model.checkpoint2 = FSDP(module=model.checkpoint2, **fsdp_kwargs)
2025-12-04T12:33:42.7174131Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_checkpoint.py:325: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:33:42.7174525Z   model_ac.checkpoint1 = FSDP(module=model_ac.checkpoint1, **fsdp_kwargs)
2025-12-04T12:33:42.7174921Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_checkpoint.py:326: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:33:42.7175310Z   model_ac.checkpoint2 = FSDP(module=model_ac.checkpoint2, **fsdp_kwargs)
2025-12-04T12:33:42.7175706Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_checkpoint.py:323: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:33:42.7176090Z   model.checkpoint2 = FSDP(module=model.checkpoint2, **fsdp_kwargs)
2025-12-04T12:33:42.7176483Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_checkpoint.py:325: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:33:42.7176871Z   model_ac.checkpoint1 = FSDP(module=model_ac.checkpoint1, **fsdp_kwargs)
2025-12-04T12:33:42.7177266Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_checkpoint.py:322: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:33:42.7177677Z   model.checkpoint1 = FSDP(module=model.checkpoint1, **fsdp_kwargs)
2025-12-04T12:33:42.7178059Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_checkpoint.py:326: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:33:42.7178443Z   model_ac.checkpoint2 = FSDP(module=model_ac.checkpoint2, **fsdp_kwargs)
2025-12-04T12:33:42.7179052Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T12:33:42.7179667Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T12:33:42.7180033Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_checkpoint.py:323: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:33:42.7180415Z   model.checkpoint2 = FSDP(module=model.checkpoint2, **fsdp_kwargs)
2025-12-04T12:33:42.7180804Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_checkpoint.py:325: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:33:42.7181205Z   model_ac.checkpoint1 = FSDP(module=model_ac.checkpoint1, **fsdp_kwargs)
2025-12-04T12:33:42.7181597Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_checkpoint.py:326: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:33:42.7181981Z   model_ac.checkpoint2 = FSDP(module=model_ac.checkpoint2, **fsdp_kwargs)
2025-12-04T12:33:42.7182391Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_checkpoint.py:322: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:33:42.7182773Z   model.checkpoint1 = FSDP(module=model.checkpoint1, **fsdp_kwargs)
2025-12-04T12:33:42.7183376Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T12:33:42.7183958Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T12:33:42.7184399Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_checkpoint.py:323: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:33:42.7184780Z   model.checkpoint2 = FSDP(module=model.checkpoint2, **fsdp_kwargs)
2025-12-04T12:33:42.7185161Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_checkpoint.py:325: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:33:42.7185541Z   model_ac.checkpoint1 = FSDP(module=model_ac.checkpoint1, **fsdp_kwargs)
2025-12-04T12:33:42.7185930Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_checkpoint.py:326: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:33:42.7186317Z   model_ac.checkpoint2 = FSDP(module=model_ac.checkpoint2, **fsdp_kwargs)
2025-12-04T12:33:42.7187750Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:33:42.7189193Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:33:42.7190642Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:33:42.7192058Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:33:42.7193489Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:33:42.7194906Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:33:42.7196329Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:33:42.7197731Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:33:42.7198033Z [rank2]:E1204 12:33:37.627000 326576 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:33:42.7198375Z [rank2]:E1204 12:33:37.627000 326576 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:33:42.7198864Z [rank2]:E1204 12:33:37.627000 326576 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:33:42.7199346Z [rank2]:E1204 12:33:37.627000 326576 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:33:42.7199877Z [rank2]:E1204 12:33:37.627000 326576 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:33:42.7200328Z [rank2]:E1204 12:33:37.627000 326576 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:33:42.7200772Z [rank2]:E1204 12:33:37.627000 326576 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:42.7201256Z [rank2]:E1204 12:33:37.627000 326576 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:33:42.7201721Z [rank2]:E1204 12:33:37.627000 326576 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:42.7202198Z [rank2]:E1204 12:33:37.627000 326576 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:33:42.7202661Z [rank2]:E1204 12:33:37.627000 326576 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:33:42.7203114Z [rank2]:E1204 12:33:37.627000 326576 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:33:42.7203573Z [rank2]:E1204 12:33:37.627000 326576 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:33:42.7204041Z [rank2]:E1204 12:33:37.627000 326576 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:33:42.7204739Z [rank2]:E1204 12:33:37.627000 326576 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPCheckpointSubmoduleCUDA.test_checkpoint_submodule_use_reentrant_False_cuda! Caching allocator allocated memory was 512 and is now reported as 3236352 on device 2. CUDA driver allocated memory was 2300575744 and is now 3789553664.
2025-12-04T12:33:42.7205388Z [rank2]:E1204 12:33:37.627000 326576 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:33:42.7205741Z [rank2]:E1204 12:33:37.627000 326576 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:33:42.7206387Z [rank2]:E1204 12:33:37.627000 326576 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_checkpoint.py TestFSDPCheckpointSubmoduleCUDA.test_checkpoint_submodule_use_reentrant_False_cuda
2025-12-04T12:33:42.7206922Z [rank2]:E1204 12:33:37.627000 326576 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:33:42.7207285Z [rank2]:E1204 12:33:37.627000 326576 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:33:42.7207700Z [rank2]:E1204 12:33:37.627000 326576 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T12:33:42.7207943Z dist init r=2, world=4
2025-12-04T12:33:42.7208147Z [rank1]:E1204 12:33:37.632000 326575 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:33:42.7208484Z [rank1]:E1204 12:33:37.632000 326575 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:33:42.7208973Z [rank1]:E1204 12:33:37.632000 326575 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:33:42.7209455Z [rank1]:E1204 12:33:37.632000 326575 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:33:42.7210009Z [rank1]:E1204 12:33:37.632000 326575 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:33:42.7210475Z [rank1]:E1204 12:33:37.632000 326575 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:33:42.7210919Z [rank1]:E1204 12:33:37.632000 326575 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:42.7211399Z [rank1]:E1204 12:33:37.632000 326575 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:33:42.7211862Z [rank1]:E1204 12:33:37.632000 326575 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:42.7212327Z [rank1]:E1204 12:33:37.632000 326575 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:33:42.7212793Z [rank1]:E1204 12:33:37.632000 326575 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:33:42.7213245Z [rank1]:E1204 12:33:37.632000 326575 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:33:42.7213701Z [rank1]:E1204 12:33:37.632000 326575 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:33:42.7214166Z [rank1]:E1204 12:33:37.632000 326575 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:33:42.7214855Z [rank1]:E1204 12:33:37.632000 326575 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPCheckpointSubmoduleCUDA.test_checkpoint_submodule_use_reentrant_False_cuda! Caching allocator allocated memory was 512 and is now reported as 3236352 on device 1. CUDA driver allocated memory was 2317352960 and is now 3806330880.
2025-12-04T12:33:42.7215514Z [rank1]:E1204 12:33:37.632000 326575 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:33:42.7215901Z [rank1]:E1204 12:33:37.632000 326575 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:33:42.7216525Z [rank1]:E1204 12:33:37.632000 326575 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_checkpoint.py TestFSDPCheckpointSubmoduleCUDA.test_checkpoint_submodule_use_reentrant_False_cuda
2025-12-04T12:33:42.7217055Z [rank1]:E1204 12:33:37.632000 326575 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:33:42.7217425Z [rank1]:E1204 12:33:37.632000 326575 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:33:42.7217844Z [rank1]:E1204 12:33:37.632000 326575 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T12:33:42.7218089Z dist init r=1, world=4
2025-12-04T12:33:42.7218297Z [rank3]:E1204 12:33:37.675000 326577 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:33:42.7218637Z [rank3]:E1204 12:33:37.675000 326577 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:33:42.7219125Z [rank3]:E1204 12:33:37.675000 326577 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:33:42.7219649Z [rank3]:E1204 12:33:37.675000 326577 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:33:42.7220127Z [rank3]:E1204 12:33:37.675000 326577 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:33:42.7220592Z [rank3]:E1204 12:33:37.675000 326577 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:33:42.7221032Z [rank3]:E1204 12:33:37.675000 326577 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:42.7221501Z [rank3]:E1204 12:33:37.675000 326577 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:33:42.7221966Z [rank3]:E1204 12:33:37.675000 326577 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:42.7222437Z [rank3]:E1204 12:33:37.675000 326577 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:33:42.7222912Z [rank3]:E1204 12:33:37.675000 326577 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:33:42.7223370Z [rank3]:E1204 12:33:37.675000 326577 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:33:42.7223826Z [rank3]:E1204 12:33:37.675000 326577 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:33:42.7224296Z [rank3]:E1204 12:33:37.675000 326577 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:33:42.7225021Z [rank3]:E1204 12:33:37.675000 326577 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPCheckpointSubmoduleCUDA.test_checkpoint_submodule_use_reentrant_False_cuda! Caching allocator allocated memory was 512 and is now reported as 3236352 on device 3. CUDA driver allocated memory was 2250244096 and is now 3739222016.
2025-12-04T12:33:42.7225671Z [rank3]:E1204 12:33:37.675000 326577 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:33:42.7226021Z [rank3]:E1204 12:33:37.675000 326577 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:33:42.7226640Z [rank3]:E1204 12:33:37.675000 326577 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_checkpoint.py TestFSDPCheckpointSubmoduleCUDA.test_checkpoint_submodule_use_reentrant_False_cuda
2025-12-04T12:33:42.7227175Z [rank3]:E1204 12:33:37.675000 326577 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:33:42.7227543Z [rank3]:E1204 12:33:37.675000 326577 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:33:42.7227956Z [rank3]:E1204 12:33:37.675000 326577 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T12:33:42.7228196Z dist init r=3, world=4
2025-12-04T12:33:42.7228398Z [rank0]:E1204 12:33:37.755000 326574 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:33:42.7228738Z [rank0]:E1204 12:33:37.755000 326574 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:33:42.7229242Z [rank0]:E1204 12:33:37.755000 326574 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:33:42.7229767Z [rank0]:E1204 12:33:37.755000 326574 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:33:42.7230260Z [rank0]:E1204 12:33:37.755000 326574 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:33:42.7230708Z [rank0]:E1204 12:33:37.755000 326574 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:33:42.7231151Z [rank0]:E1204 12:33:37.755000 326574 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:42.7231616Z [rank0]:E1204 12:33:37.755000 326574 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:33:42.7232078Z [rank0]:E1204 12:33:37.755000 326574 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:42.7232547Z [rank0]:E1204 12:33:37.755000 326574 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:33:42.7233013Z [rank0]:E1204 12:33:37.755000 326574 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:33:42.7233469Z [rank0]:E1204 12:33:37.755000 326574 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:33:42.7233926Z [rank0]:E1204 12:33:37.755000 326574 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:33:42.7234392Z [rank0]:E1204 12:33:37.755000 326574 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:33:42.7235118Z [rank0]:E1204 12:33:37.755000 326574 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPCheckpointSubmoduleCUDA.test_checkpoint_submodule_use_reentrant_False_cuda! Caching allocator allocated memory was 512 and is now reported as 3236352 on device 0. CUDA driver allocated memory was 2459959296 and is now 3948937216.
2025-12-04T12:33:42.7235769Z [rank0]:E1204 12:33:37.755000 326574 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:33:42.7236121Z [rank0]:E1204 12:33:37.755000 326574 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:33:42.7236741Z [rank0]:E1204 12:33:37.755000 326574 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_checkpoint.py TestFSDPCheckpointSubmoduleCUDA.test_checkpoint_submodule_use_reentrant_False_cuda
2025-12-04T12:33:42.7237273Z [rank0]:E1204 12:33:37.755000 326574 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:33:42.7237640Z [rank0]:E1204 12:33:37.755000 326574 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:33:42.7238054Z [rank0]:E1204 12:33:37.755000 326574 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T12:33:42.7238311Z dist init r=0, world=4
2025-12-04T12:33:42.7238713Z [rank0]:[W1204 12:33:38.990647884 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T12:33:42.7239134Z FAILED [8.9225s] [100%]
2025-12-04T12:33:42.7239201Z 
2025-12-04T12:33:42.7239260Z =================================== FAILURES ===================================
2025-12-04T12:33:42.7239476Z _ TestFSDPCheckpointSubmoduleCUDA.test_checkpoint_submodule_use_reentrant_False_cuda _
2025-12-04T12:33:42.7239721Z Traceback (most recent call last):
2025-12-04T12:33:42.7239968Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T12:33:42.7240216Z     self._join_processes(fn)
2025-12-04T12:33:42.7240465Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T12:33:42.7240736Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T12:33:42.7241012Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T12:33:42.7241274Z     raise RuntimeError(error)
2025-12-04T12:33:42.7241430Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T12:33:42.7241599Z Traceback (most recent call last):
2025-12-04T12:33:42.7241844Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:33:42.7242089Z     getattr(self, test_name)()
2025-12-04T12:33:42.7242327Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:33:42.7242563Z     fn()
2025-12-04T12:33:42.7242770Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:42.7243004Z     method(*args, **kwargs)
2025-12-04T12:33:42.7243229Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:42.7243462Z     method(*args, **kwargs)
2025-12-04T12:33:42.7243714Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:33:42.7243946Z     with policy():
2025-12-04T12:33:42.7244163Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:33:42.7244401Z     raise RuntimeError(msg)
2025-12-04T12:33:42.7244844Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPCheckpointSubmoduleCUDA.test_checkpoint_submodule_use_reentrant_False_cuda! Caching allocator allocated memory was 512 and is now reported as 3236352 on device 3. CUDA driver allocated memory was 2250244096 and is now 3739222016.
2025-12-04T12:33:42.7245258Z 
2025-12-04T12:33:42.7245334Z To execute this test, run the following from the base repo dir:
2025-12-04T12:33:42.7245712Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_checkpoint.py TestFSDPCheckpointSubmoduleCUDA.test_checkpoint_submodule_use_reentrant_False_cuda
2025-12-04T12:33:42.7246010Z 
2025-12-04T12:33:42.7246106Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:33:42.7246234Z 
2025-12-04T12:33:42.7246235Z 
2025-12-04T12:33:42.7246313Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:33:42.7246517Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T12:33:42.7246902Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_checkpoint/distributed.fsdp.test_fsdp_checkpoint-0e6d6998b6f1fe1a.xml -
2025-12-04T12:33:42.7247276Z =========================== short test summary info ============================
2025-12-04T12:33:42.7247654Z FAILED [8.9225s] distributed/fsdp/test_fsdp_checkpoint.py::TestFSDPCheckpointSubmoduleCUDA::test_checkpoint_submodule_use_reentrant_False_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T12:33:42.7248027Z Traceback (most recent call last):
2025-12-04T12:33:42.7248278Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:33:42.7248527Z     getattr(self, test_name)()
2025-12-04T12:33:42.7248765Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:33:42.7249002Z     fn()
2025-12-04T12:33:42.7249206Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:42.7249440Z     method(*args, **kwargs)
2025-12-04T12:33:42.7249692Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:42.7249926Z     method(*args, **kwargs)
2025-12-04T12:33:42.7250147Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:33:42.7250377Z     with policy():
2025-12-04T12:33:42.7250593Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:33:42.7250826Z     raise RuntimeError(msg)
2025-12-04T12:33:42.7251276Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPCheckpointSubmoduleCUDA.test_checkpoint_submodule_use_reentrant_False_cuda! Caching allocator allocated memory was 512 and is now reported as 3236352 on device 3. CUDA driver allocated memory was 2250244096 and is now 3739222016.
2025-12-04T12:33:42.7251687Z 
2025-12-04T12:33:42.7251765Z To execute this test, run the following from the base repo dir:
2025-12-04T12:33:42.7252139Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_checkpoint.py TestFSDPCheckpointSubmoduleCUDA.test_checkpoint_submodule_use_reentrant_False_cuda
2025-12-04T12:33:42.7252432Z 
2025-12-04T12:33:42.7252526Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:33:42.7252751Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:33:42.7252921Z ======================= 1 failed, 16 deselected in 8.93s =======================
2025-12-04T12:33:42.7253061Z Got exit code 1
2025-12-04T12:33:42.7253333Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_checkpoint.py::TestFSDPCheckpointSubmoduleCUDA::test_checkpoint_submodule_use_reentrant_False_cuda
2025-12-04T12:33:42.7253707Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:33:42.7254088Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_checkpoint/distributed.fsdp.test_fsdp_checkpoint-9f1e911ddb2b895a.xml
2025-12-04T12:33:42.7254393Z ============================= test session starts ==============================
2025-12-04T12:33:42.7254605Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:33:42.7254798Z cachedir: .pytest_cache
2025-12-04T12:33:42.7255031Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:33:42.7255271Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T12:33:42.7255391Z configfile: pytest.ini
2025-12-04T12:33:42.7255619Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T12:33:42.7256156Z collecting ... /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_checkpoint.py:292: PytestCollectionWarning: cannot collect test class 'TestModel' because it has a __init__ constructor (from: test/distributed/fsdp/test_fsdp_checkpoint.py)
2025-12-04T12:33:42.7256588Z   class TestModel(nn.Module):
2025-12-04T12:33:42.7256718Z collected 17 items / 17 deselected / 0 selected
2025-12-04T12:33:42.7256861Z stepcurrent: skipping 17 already run items.
2025-12-04T12:33:42.7257014Z Running 0 items in this shard
2025-12-04T12:33:42.7257085Z 
2025-12-04T12:33:42.7261336Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_checkpoint/distributed.fsdp.test_fsdp_checkpoint-9f1e911ddb2b895a.xml -
2025-12-04T12:33:42.7261706Z ============================ 17 deselected in 0.01s ============================
2025-12-04T12:33:42.7262053Z The following tests failed consistently: ['test/distributed/fsdp/test_fsdp_checkpoint.py::TestFSDPCheckpointSubmoduleCUDA::test_checkpoint_submodule_use_reentrant_False_cuda']
2025-12-04T12:33:42.7262329Z 
2025-12-04T12:33:42.7262535Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_checkpoint 1/1 (test/test-reports/distributed.fsdp.test_fsdp_checkpoint_1.1_c244fb9a4f737098_.log)
2025-12-04T12:33:42.7262782Z 
2025-12-04T12:33:42.7262917Z Finished distributed/fsdp/test_fsdp_checkpoint 1/1 ... [2025-12-04 12:33:42.690361][5229663.669400142], took 2.82min
2025-12-04T12:33:42.7263371Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T12:33:42.7263761Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T12:33:42.7263979Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading
2025-12-04T12:33:42.7264156Z Uploading artifacts took 0.00 seconds
2025-12-04T12:33:42.7264299Z distributed/fsdp/test_fsdp_checkpoint 1/1 failed!
2025-12-04T12:33:42.7264505Z Running distributed/fsdp/test_fsdp_fine_tune 1/1 ... [2025-12-04 12:33:42.693128][5229663.672169396]
2025-12-04T12:33:42.7264707Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T12:33:42.7265111Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/fsdp/test_fsdp_fine_tune.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:33:42.693318]
2025-12-04T12:36:07.5448974Z 
2025-12-04T12:36:07.5452015Z PRINTING LOG FILE of distributed/fsdp/test_fsdp_fine_tune 1/1 (test/test-reports/distributed.fsdp.test_fsdp_fine_tune_1.1_aed87725c804591d_.log)
2025-12-04T12:36:07.5453073Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-d189db2d31ea5eb7.xml
2025-12-04T12:36:07.5453783Z ============================= test session starts ==============================
2025-12-04T12:36:07.5454359Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:36:07.5454814Z cachedir: .pytest_cache
2025-12-04T12:36:07.5455389Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:36:07.5455913Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T12:36:07.5456168Z configfile: pytest.ini
2025-12-04T12:36:07.5456668Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T12:36:07.5457191Z collecting ... collected 4 items
2025-12-04T12:36:07.5457485Z stepcurrent: Cannot find last run test, not skipping
2025-12-04T12:36:07.5458964Z Running 4 items in this shard: test/distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_backward_reshard_hooks_cuda, test/distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_hooks_multi_traversal_cuda, test/distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_parity_with_ddp_cuda, test/distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_parity_with_non_frozen_fsdp_cuda
2025-12-04T12:36:07.5460465Z 
2025-12-04T12:36:07.5461099Z distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_backward_reshard_hooks_cuda I1204 12:33:44.536000 326975 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 327044
2025-12-04T12:36:07.5462229Z I1204 12:33:44.537000 326975 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 327045
2025-12-04T12:36:07.5463734Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T12:36:07.5464940Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T12:36:07.5465839Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T12:36:07.5466720Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T12:36:07.5467296Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T12:36:07.5467850Z   return func(*args, **kwargs)
2025-12-04T12:36:07.5468380Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:36:07.5468918Z   return fsdp_fn(module, **kwargs)
2025-12-04T12:36:07.5469463Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:36:07.5470041Z   return fsdp_fn(module, **kwargs)
2025-12-04T12:36:07.5470615Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_fine_tune.py:123: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:36:07.5471125Z   seq = FSDP(
2025-12-04T12:36:07.5471601Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_fine_tune.py:123: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:36:07.5472138Z   seq = FSDP(
2025-12-04T12:36:07.5474127Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:36:07.5475895Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:36:07.5477565Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:36:07.5479198Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:36:07.5479546Z [rank0]:E1204 12:33:51.622000 327044 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:36:07.5479978Z [rank0]:E1204 12:33:51.622000 327044 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:36:07.5480551Z [rank0]:E1204 12:33:51.622000 327044 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:36:07.5481102Z [rank0]:E1204 12:33:51.622000 327044 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:36:07.5481656Z [rank0]:E1204 12:33:51.622000 327044 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:36:07.5482169Z [rank0]:E1204 12:33:51.622000 327044 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:36:07.5482715Z [rank0]:E1204 12:33:51.622000 327044 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5483253Z [rank0]:E1204 12:33:51.622000 327044 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:36:07.5483787Z [rank0]:E1204 12:33:51.622000 327044 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5484323Z [rank0]:E1204 12:33:51.622000 327044 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:36:07.5484813Z [rank0]:E1204 12:33:51.622000 327044 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:36:07.5485265Z [rank0]:E1204 12:33:51.622000 327044 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:36:07.5485719Z [rank0]:E1204 12:33:51.622000 327044 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:36:07.5486180Z [rank0]:E1204 12:33:51.622000 327044 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:36:07.5486836Z [rank0]:E1204 12:33:51.622000 327044 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda! Caching allocator allocated memory was 512 and is now reported as 88064 on device 0. CUDA driver allocated memory was 2019557376 and is now 3539992576.
2025-12-04T12:36:07.5487449Z [rank0]:E1204 12:33:51.622000 327044 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:36:07.5487794Z [rank0]:E1204 12:33:51.622000 327044 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:36:07.5488374Z [rank0]:E1204 12:33:51.622000 327044 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda
2025-12-04T12:36:07.5488865Z [rank0]:E1204 12:33:51.622000 327044 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:36:07.5489237Z [rank0]:E1204 12:33:51.622000 327044 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:36:07.5489698Z [rank0]:E1204 12:33:51.622000 327044 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T12:36:07.5489941Z dist init r=0, world=2
2025-12-04T12:36:07.5490143Z [rank1]:E1204 12:33:51.625000 327045 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:36:07.5490478Z [rank1]:E1204 12:33:51.625000 327045 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:36:07.5490962Z [rank1]:E1204 12:33:51.625000 327045 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:36:07.5491438Z [rank1]:E1204 12:33:51.625000 327045 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:36:07.5491954Z [rank1]:E1204 12:33:51.625000 327045 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:36:07.5492398Z [rank1]:E1204 12:33:51.625000 327045 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:36:07.5492831Z [rank1]:E1204 12:33:51.625000 327045 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5493292Z [rank1]:E1204 12:33:51.625000 327045 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:36:07.5493756Z [rank1]:E1204 12:33:51.625000 327045 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5494215Z [rank1]:E1204 12:33:51.625000 327045 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:36:07.5494682Z [rank1]:E1204 12:33:51.625000 327045 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:36:07.5495133Z [rank1]:E1204 12:33:51.625000 327045 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:36:07.5495589Z [rank1]:E1204 12:33:51.625000 327045 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:36:07.5496084Z [rank1]:E1204 12:33:51.625000 327045 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:36:07.5496722Z [rank1]:E1204 12:33:51.625000 327045 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda! Caching allocator allocated memory was 512 and is now reported as 88064 on device 1. CUDA driver allocated memory was 1864368128 and is now 3384803328.
2025-12-04T12:36:07.5497335Z [rank1]:E1204 12:33:51.625000 327045 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:36:07.5497681Z [rank1]:E1204 12:33:51.625000 327045 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:36:07.5498247Z [rank1]:E1204 12:33:51.625000 327045 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda
2025-12-04T12:36:07.5498725Z [rank1]:E1204 12:33:51.625000 327045 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:36:07.5499086Z [rank1]:E1204 12:33:51.625000 327045 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:36:07.5499493Z [rank1]:E1204 12:33:51.625000 327045 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T12:36:07.5499807Z dist init r=1, world=2
2025-12-04T12:36:07.5500220Z [rank0]:[W1204 12:33:51.800387324 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T12:36:07.5500626Z FAILED [8.9174s] [ 25%]
2025-12-04T12:36:07.5500690Z 
2025-12-04T12:36:07.5500747Z =================================== FAILURES ===================================
2025-12-04T12:36:07.5500935Z ____________ TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda _____________
2025-12-04T12:36:07.5501109Z Traceback (most recent call last):
2025-12-04T12:36:07.5501384Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T12:36:07.5501627Z     self._join_processes(fn)
2025-12-04T12:36:07.5501871Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T12:36:07.5502134Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T12:36:07.5502399Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T12:36:07.5502658Z     raise RuntimeError(error)
2025-12-04T12:36:07.5502808Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:36:07.5502967Z Traceback (most recent call last):
2025-12-04T12:36:07.5503206Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:36:07.5503450Z     getattr(self, test_name)()
2025-12-04T12:36:07.5503681Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:36:07.5503913Z     fn()
2025-12-04T12:36:07.5504115Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5504346Z     method(*args, **kwargs)
2025-12-04T12:36:07.5504570Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5504815Z     method(*args, **kwargs)
2025-12-04T12:36:07.5505031Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:36:07.5505255Z     with policy():
2025-12-04T12:36:07.5505471Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:36:07.5505726Z     raise RuntimeError(msg)
2025-12-04T12:36:07.5506122Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda! Caching allocator allocated memory was 512 and is now reported as 88064 on device 0. CUDA driver allocated memory was 2019557376 and is now 3539992576.
2025-12-04T12:36:07.5506494Z 
2025-12-04T12:36:07.5506567Z To execute this test, run the following from the base repo dir:
2025-12-04T12:36:07.5506885Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda
2025-12-04T12:36:07.5507127Z 
2025-12-04T12:36:07.5507216Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:36:07.5507339Z 
2025-12-04T12:36:07.5507341Z 
2025-12-04T12:36:07.5507421Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:36:07.5507622Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T12:36:07.5507991Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-d189db2d31ea5eb7.xml -
2025-12-04T12:36:07.5508329Z =========================== short test summary info ============================
2025-12-04T12:36:07.5508657Z FAILED [8.9174s] distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_backward_reshard_hooks_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:36:07.5509221Z Traceback (most recent call last):
2025-12-04T12:36:07.5509463Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:36:07.5509752Z     getattr(self, test_name)()
2025-12-04T12:36:07.5509981Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:36:07.5510246Z     fn()
2025-12-04T12:36:07.5510446Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5510671Z     method(*args, **kwargs)
2025-12-04T12:36:07.5510887Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5511114Z     method(*args, **kwargs)
2025-12-04T12:36:07.5511328Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:36:07.5511550Z     with policy():
2025-12-04T12:36:07.5511761Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:36:07.5511989Z     raise RuntimeError(msg)
2025-12-04T12:36:07.5512385Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda! Caching allocator allocated memory was 512 and is now reported as 88064 on device 0. CUDA driver allocated memory was 2019557376 and is now 3539992576.
2025-12-04T12:36:07.5512745Z 
2025-12-04T12:36:07.5512819Z To execute this test, run the following from the base repo dir:
2025-12-04T12:36:07.5513137Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda
2025-12-04T12:36:07.5513380Z 
2025-12-04T12:36:07.5513484Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:36:07.5513670Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:36:07.5513825Z ============================== 1 failed in 8.93s ===============================
2025-12-04T12:36:07.5513954Z Got exit code 1
2025-12-04T12:36:07.5514048Z Retrying single test...
2025-12-04T12:36:07.5514329Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-29338384b66251d1.xml
2025-12-04T12:36:07.5514620Z ============================= test session starts ==============================
2025-12-04T12:36:07.5514829Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:36:07.5515015Z cachedir: .pytest_cache
2025-12-04T12:36:07.5515240Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:36:07.5515481Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T12:36:07.5515597Z configfile: pytest.ini
2025-12-04T12:36:07.5515822Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T12:36:07.5516088Z collecting ... collected 4 items / 3 deselected / 1 selected
2025-12-04T12:36:07.5516394Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_backward_reshard_hooks_cuda
2025-12-04T12:36:07.5516668Z Running 1 items in this shard
2025-12-04T12:36:07.5516743Z 
2025-12-04T12:36:07.5517035Z distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_backward_reshard_hooks_cuda I1204 12:33:55.923000 327211 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 327280
2025-12-04T12:36:07.5517511Z I1204 12:33:55.924000 327211 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 327281
2025-12-04T12:36:07.5518201Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T12:36:07.5518813Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T12:36:07.5519394Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T12:36:07.5520022Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T12:36:07.5520409Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T12:36:07.5520777Z   return func(*args, **kwargs)
2025-12-04T12:36:07.5521131Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:36:07.5521488Z   return fsdp_fn(module, **kwargs)
2025-12-04T12:36:07.5521840Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:36:07.5522196Z   return fsdp_fn(module, **kwargs)
2025-12-04T12:36:07.5522539Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_fine_tune.py:123: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:36:07.5522893Z   seq = FSDP(
2025-12-04T12:36:07.5523206Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_fine_tune.py:123: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:36:07.5523555Z   seq = FSDP(
2025-12-04T12:36:07.5524878Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:36:07.5526290Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:36:07.5527767Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:36:07.5529164Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:36:07.5529464Z [rank0]:E1204 12:34:02.947000 327280 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:36:07.5529849Z [rank0]:E1204 12:34:02.947000 327280 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:36:07.5530339Z [rank0]:E1204 12:34:02.947000 327280 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:36:07.5530822Z [rank0]:E1204 12:34:02.947000 327280 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:36:07.5531301Z [rank0]:E1204 12:34:02.947000 327280 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:36:07.5531745Z [rank0]:E1204 12:34:02.947000 327280 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:36:07.5532184Z [rank0]:E1204 12:34:02.947000 327280 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5532661Z [rank0]:E1204 12:34:02.947000 327280 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:36:07.5533123Z [rank0]:E1204 12:34:02.947000 327280 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5533628Z [rank0]:E1204 12:34:02.947000 327280 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:36:07.5534086Z [rank0]:E1204 12:34:02.947000 327280 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:36:07.5534532Z [rank0]:E1204 12:34:02.947000 327280 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:36:07.5534982Z [rank0]:E1204 12:34:02.947000 327280 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:36:07.5535441Z [rank0]:E1204 12:34:02.947000 327280 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:36:07.5536085Z [rank0]:E1204 12:34:02.947000 327280 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda! Caching allocator allocated memory was 512 and is now reported as 88064 on device 0. CUDA driver allocated memory was 2019557376 and is now 3539992576.
2025-12-04T12:36:07.5536686Z [rank0]:E1204 12:34:02.947000 327280 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:36:07.5537032Z [rank0]:E1204 12:34:02.947000 327280 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:36:07.5537595Z [rank0]:E1204 12:34:02.947000 327280 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda
2025-12-04T12:36:07.5538100Z [rank0]:E1204 12:34:02.947000 327280 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:36:07.5538462Z [rank0]:E1204 12:34:02.947000 327280 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:36:07.5538872Z [rank0]:E1204 12:34:02.947000 327280 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T12:36:07.5539113Z dist init r=0, world=2
2025-12-04T12:36:07.5539313Z [rank1]:E1204 12:34:02.951000 327281 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:36:07.5539686Z [rank1]:E1204 12:34:02.951000 327281 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:36:07.5540170Z [rank1]:E1204 12:34:02.951000 327281 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:36:07.5540647Z [rank1]:E1204 12:34:02.951000 327281 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:36:07.5541121Z [rank1]:E1204 12:34:02.951000 327281 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:36:07.5541579Z [rank1]:E1204 12:34:02.951000 327281 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:36:07.5542012Z [rank1]:E1204 12:34:02.951000 327281 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5542491Z [rank1]:E1204 12:34:02.951000 327281 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:36:07.5542950Z [rank1]:E1204 12:34:02.951000 327281 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5543408Z [rank1]:E1204 12:34:02.951000 327281 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:36:07.5543868Z [rank1]:E1204 12:34:02.951000 327281 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:36:07.5544315Z [rank1]:E1204 12:34:02.951000 327281 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:36:07.5544767Z [rank1]:E1204 12:34:02.951000 327281 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:36:07.5545230Z [rank1]:E1204 12:34:02.951000 327281 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:36:07.5545869Z [rank1]:E1204 12:34:02.951000 327281 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda! Caching allocator allocated memory was 512 and is now reported as 88064 on device 1. CUDA driver allocated memory was 1864368128 and is now 3384803328.
2025-12-04T12:36:07.5546468Z [rank1]:E1204 12:34:02.951000 327281 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:36:07.5546815Z [rank1]:E1204 12:34:02.951000 327281 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:36:07.5547409Z [rank1]:E1204 12:34:02.951000 327281 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda
2025-12-04T12:36:07.5547887Z [rank1]:E1204 12:34:02.951000 327281 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:36:07.5548249Z [rank1]:E1204 12:34:02.951000 327281 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:36:07.5548660Z [rank1]:E1204 12:34:02.951000 327281 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T12:36:07.5548897Z dist init r=1, world=2
2025-12-04T12:36:07.5549296Z [rank0]:[W1204 12:34:03.121269341 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T12:36:07.5549743Z FAILED [8.9169s] [100%]
2025-12-04T12:36:07.5549807Z 
2025-12-04T12:36:07.5549864Z =================================== FAILURES ===================================
2025-12-04T12:36:07.5550053Z ____________ TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda _____________
2025-12-04T12:36:07.5550228Z Traceback (most recent call last):
2025-12-04T12:36:07.5550471Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T12:36:07.5550734Z     self._join_processes(fn)
2025-12-04T12:36:07.5550978Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T12:36:07.5551244Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T12:36:07.5551528Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T12:36:07.5551785Z     raise RuntimeError(error)
2025-12-04T12:36:07.5551934Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:36:07.5552091Z Traceback (most recent call last):
2025-12-04T12:36:07.5552327Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:36:07.5552568Z     getattr(self, test_name)()
2025-12-04T12:36:07.5552799Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:36:07.5553029Z     fn()
2025-12-04T12:36:07.5553227Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5553457Z     method(*args, **kwargs)
2025-12-04T12:36:07.5553679Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5553907Z     method(*args, **kwargs)
2025-12-04T12:36:07.5554124Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:36:07.5554348Z     with policy():
2025-12-04T12:36:07.5554561Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:36:07.5554789Z     raise RuntimeError(msg)
2025-12-04T12:36:07.5555179Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda! Caching allocator allocated memory was 512 and is now reported as 88064 on device 0. CUDA driver allocated memory was 2019557376 and is now 3539992576.
2025-12-04T12:36:07.5555541Z 
2025-12-04T12:36:07.5555616Z To execute this test, run the following from the base repo dir:
2025-12-04T12:36:07.5555970Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda
2025-12-04T12:36:07.5556213Z 
2025-12-04T12:36:07.5556303Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:36:07.5556426Z 
2025-12-04T12:36:07.5556485Z Process 1 exited with error code 10 and exception:
2025-12-04T12:36:07.5556624Z Traceback (most recent call last):
2025-12-04T12:36:07.5556869Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:36:07.5557110Z     getattr(self, test_name)()
2025-12-04T12:36:07.5557342Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:36:07.5557574Z     fn()
2025-12-04T12:36:07.5557773Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5558004Z     method(*args, **kwargs)
2025-12-04T12:36:07.5558219Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5558445Z     method(*args, **kwargs)
2025-12-04T12:36:07.5558659Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:36:07.5558882Z     with policy():
2025-12-04T12:36:07.5559092Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:36:07.5559334Z     raise RuntimeError(msg)
2025-12-04T12:36:07.5559766Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda! Caching allocator allocated memory was 512 and is now reported as 88064 on device 1. CUDA driver allocated memory was 1864368128 and is now 3384803328.
2025-12-04T12:36:07.5560136Z 
2025-12-04T12:36:07.5560216Z To execute this test, run the following from the base repo dir:
2025-12-04T12:36:07.5560538Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda
2025-12-04T12:36:07.5560776Z 
2025-12-04T12:36:07.5560864Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:36:07.5560989Z 
2025-12-04T12:36:07.5560991Z 
2025-12-04T12:36:07.5561067Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:36:07.5561266Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T12:36:07.5561632Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-29338384b66251d1.xml -
2025-12-04T12:36:07.5561969Z =========================== short test summary info ============================
2025-12-04T12:36:07.5562296Z FAILED [8.9169s] distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_backward_reshard_hooks_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:36:07.5562602Z Traceback (most recent call last):
2025-12-04T12:36:07.5562843Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:36:07.5563084Z     getattr(self, test_name)()
2025-12-04T12:36:07.5563314Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:36:07.5563544Z     fn()
2025-12-04T12:36:07.5563742Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5563970Z     method(*args, **kwargs)
2025-12-04T12:36:07.5564192Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5564456Z     method(*args, **kwargs)
2025-12-04T12:36:07.5564673Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:36:07.5564896Z     with policy():
2025-12-04T12:36:07.5565106Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:36:07.5565335Z     raise RuntimeError(msg)
2025-12-04T12:36:07.5565727Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda! Caching allocator allocated memory was 512 and is now reported as 88064 on device 0. CUDA driver allocated memory was 2019557376 and is now 3539992576.
2025-12-04T12:36:07.5566089Z 
2025-12-04T12:36:07.5566162Z To execute this test, run the following from the base repo dir:
2025-12-04T12:36:07.5566482Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda
2025-12-04T12:36:07.5566723Z 
2025-12-04T12:36:07.5566809Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:36:07.5566934Z 
2025-12-04T12:36:07.5566991Z Process 1 exited with error code 10 and exception:
2025-12-04T12:36:07.5567128Z Traceback (most recent call last):
2025-12-04T12:36:07.5567367Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:36:07.5567626Z     getattr(self, test_name)()
2025-12-04T12:36:07.5567857Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:36:07.5568086Z     fn()
2025-12-04T12:36:07.5568287Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5568528Z     method(*args, **kwargs)
2025-12-04T12:36:07.5568746Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5568971Z     method(*args, **kwargs)
2025-12-04T12:36:07.5569186Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:36:07.5569409Z     with policy():
2025-12-04T12:36:07.5569662Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:36:07.5569893Z     raise RuntimeError(msg)
2025-12-04T12:36:07.5570282Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda! Caching allocator allocated memory was 512 and is now reported as 88064 on device 1. CUDA driver allocated memory was 1864368128 and is now 3384803328.
2025-12-04T12:36:07.5570644Z 
2025-12-04T12:36:07.5570718Z To execute this test, run the following from the base repo dir:
2025-12-04T12:36:07.5571035Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda
2025-12-04T12:36:07.5571274Z 
2025-12-04T12:36:07.5571362Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:36:07.5571547Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:36:07.5571711Z ======================= 1 failed, 3 deselected in 8.93s ========================
2025-12-04T12:36:07.5571849Z Got exit code 1
2025-12-04T12:36:07.5571943Z Retrying single test...
2025-12-04T12:36:07.5572207Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-d614fbc43046ce78.xml
2025-12-04T12:36:07.5572499Z ============================= test session starts ==============================
2025-12-04T12:36:07.5572743Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:36:07.5572932Z cachedir: .pytest_cache
2025-12-04T12:36:07.5573155Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:36:07.5573391Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T12:36:07.5573508Z configfile: pytest.ini
2025-12-04T12:36:07.5573733Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T12:36:07.5573999Z collecting ... collected 4 items / 3 deselected / 1 selected
2025-12-04T12:36:07.5574304Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_backward_reshard_hooks_cuda
2025-12-04T12:36:07.5574578Z Running 1 items in this shard
2025-12-04T12:36:07.5574649Z 
2025-12-04T12:36:07.5574948Z distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_backward_reshard_hooks_cuda I1204 12:34:07.242000 327447 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 327516
2025-12-04T12:36:07.5575422Z I1204 12:34:07.242000 327447 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 327517
2025-12-04T12:36:07.5576108Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T12:36:07.5576711Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T12:36:07.5577293Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T12:36:07.5577887Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T12:36:07.5578273Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T12:36:07.5578641Z   return func(*args, **kwargs)
2025-12-04T12:36:07.5578995Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:36:07.5579351Z   return fsdp_fn(module, **kwargs)
2025-12-04T12:36:07.5579750Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:36:07.5580106Z   return fsdp_fn(module, **kwargs)
2025-12-04T12:36:07.5580448Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_fine_tune.py:123: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:36:07.5580784Z   seq = FSDP(
2025-12-04T12:36:07.5581103Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_fine_tune.py:123: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:36:07.5581436Z   seq = FSDP(
2025-12-04T12:36:07.5582788Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:36:07.5584191Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:36:07.5585603Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:36:07.5587028Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:36:07.5587329Z [rank1]:E1204 12:34:14.327000 327517 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:36:07.5587667Z [rank1]:E1204 12:34:14.327000 327517 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:36:07.5588157Z [rank1]:E1204 12:34:14.327000 327517 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:36:07.5588638Z [rank1]:E1204 12:34:14.327000 327517 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:36:07.5589114Z [rank1]:E1204 12:34:14.327000 327517 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:36:07.5589562Z [rank1]:E1204 12:34:14.327000 327517 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:36:07.5590043Z [rank1]:E1204 12:34:14.327000 327517 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5590506Z [rank1]:E1204 12:34:14.327000 327517 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:36:07.5590970Z [rank1]:E1204 12:34:14.327000 327517 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5591438Z [rank1]:E1204 12:34:14.327000 327517 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:36:07.5591924Z [rank1]:E1204 12:34:14.327000 327517 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:36:07.5592372Z [rank1]:E1204 12:34:14.327000 327517 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:36:07.5592822Z [rank1]:E1204 12:34:14.327000 327517 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:36:07.5593284Z [rank1]:E1204 12:34:14.327000 327517 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:36:07.5593932Z [rank1]:E1204 12:34:14.327000 327517 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda! Caching allocator allocated memory was 512 and is now reported as 88064 on device 1. CUDA driver allocated memory was 1864368128 and is now 3384803328.
2025-12-04T12:36:07.5594530Z [rank1]:E1204 12:34:14.327000 327517 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:36:07.5594875Z [rank1]:E1204 12:34:14.327000 327517 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:36:07.5595437Z [rank1]:E1204 12:34:14.327000 327517 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda
2025-12-04T12:36:07.5595947Z [rank1]:E1204 12:34:14.327000 327517 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:36:07.5596311Z [rank1]:E1204 12:34:14.327000 327517 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:36:07.5596749Z [rank1]:E1204 12:34:14.327000 327517 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T12:36:07.5596988Z dist init r=1, world=2
2025-12-04T12:36:07.5597187Z [rank0]:E1204 12:34:14.328000 327516 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:36:07.5597518Z [rank0]:E1204 12:34:14.328000 327516 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:36:07.5598000Z [rank0]:E1204 12:34:14.328000 327516 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:36:07.5598474Z [rank0]:E1204 12:34:14.328000 327516 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:36:07.5598952Z [rank0]:E1204 12:34:14.328000 327516 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:36:07.5599396Z [rank0]:E1204 12:34:14.328000 327516 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:36:07.5599875Z [rank0]:E1204 12:34:14.328000 327516 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5600335Z [rank0]:E1204 12:34:14.328000 327516 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:36:07.5600833Z [rank0]:E1204 12:34:14.328000 327516 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5601294Z [rank0]:E1204 12:34:14.328000 327516 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:36:07.5601752Z [rank0]:E1204 12:34:14.328000 327516 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:36:07.5602199Z [rank0]:E1204 12:34:14.328000 327516 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:36:07.5602649Z [rank0]:E1204 12:34:14.328000 327516 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:36:07.5603111Z [rank0]:E1204 12:34:14.328000 327516 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:36:07.5603750Z [rank0]:E1204 12:34:14.328000 327516 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda! Caching allocator allocated memory was 512 and is now reported as 88064 on device 0. CUDA driver allocated memory was 2019557376 and is now 3539992576.
2025-12-04T12:36:07.5604345Z [rank0]:E1204 12:34:14.328000 327516 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:36:07.5604706Z [rank0]:E1204 12:34:14.328000 327516 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:36:07.5605268Z [rank0]:E1204 12:34:14.328000 327516 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda
2025-12-04T12:36:07.5605762Z [rank0]:E1204 12:34:14.328000 327516 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:36:07.5606120Z [rank0]:E1204 12:34:14.328000 327516 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:36:07.5606529Z [rank0]:E1204 12:34:14.328000 327516 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T12:36:07.5606767Z dist init r=0, world=2
2025-12-04T12:36:07.5607163Z [rank0]:[W1204 12:34:14.503720322 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T12:36:07.5607567Z FAILED [8.9156s] [100%]
2025-12-04T12:36:07.5607633Z 
2025-12-04T12:36:07.5607690Z =================================== FAILURES ===================================
2025-12-04T12:36:07.5607875Z ____________ TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda _____________
2025-12-04T12:36:07.5608045Z Traceback (most recent call last):
2025-12-04T12:36:07.5608287Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T12:36:07.5608529Z     self._join_processes(fn)
2025-12-04T12:36:07.5608777Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T12:36:07.5609040Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T12:36:07.5609306Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T12:36:07.5609563Z     raise RuntimeError(error)
2025-12-04T12:36:07.5609769Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:36:07.5609969Z Traceback (most recent call last):
2025-12-04T12:36:07.5610214Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:36:07.5610462Z     getattr(self, test_name)()
2025-12-04T12:36:07.5610701Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:36:07.5610937Z     fn()
2025-12-04T12:36:07.5611143Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5611379Z     method(*args, **kwargs)
2025-12-04T12:36:07.5611604Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5611838Z     method(*args, **kwargs)
2025-12-04T12:36:07.5612062Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:36:07.5612295Z     with policy():
2025-12-04T12:36:07.5612516Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:36:07.5612754Z     raise RuntimeError(msg)
2025-12-04T12:36:07.5613156Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda! Caching allocator allocated memory was 512 and is now reported as 88064 on device 0. CUDA driver allocated memory was 2019557376 and is now 3539992576.
2025-12-04T12:36:07.5613534Z 
2025-12-04T12:36:07.5613611Z To execute this test, run the following from the base repo dir:
2025-12-04T12:36:07.5613944Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda
2025-12-04T12:36:07.5614194Z 
2025-12-04T12:36:07.5614299Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:36:07.5614426Z 
2025-12-04T12:36:07.5614489Z Process 1 exited with error code 10 and exception:
2025-12-04T12:36:07.5614633Z Traceback (most recent call last):
2025-12-04T12:36:07.5614877Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:36:07.5615124Z     getattr(self, test_name)()
2025-12-04T12:36:07.5615364Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:36:07.5615602Z     fn()
2025-12-04T12:36:07.5615807Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5616043Z     method(*args, **kwargs)
2025-12-04T12:36:07.5616266Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5616501Z     method(*args, **kwargs)
2025-12-04T12:36:07.5616723Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:36:07.5616955Z     with policy():
2025-12-04T12:36:07.5617171Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:36:07.5617408Z     raise RuntimeError(msg)
2025-12-04T12:36:07.5617802Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda! Caching allocator allocated memory was 512 and is now reported as 88064 on device 1. CUDA driver allocated memory was 1864368128 and is now 3384803328.
2025-12-04T12:36:07.5618168Z 
2025-12-04T12:36:07.5618248Z To execute this test, run the following from the base repo dir:
2025-12-04T12:36:07.5618569Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda
2025-12-04T12:36:07.5618813Z 
2025-12-04T12:36:07.5618931Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:36:07.5619056Z 
2025-12-04T12:36:07.5619057Z 
2025-12-04T12:36:07.5619139Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:36:07.5619342Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T12:36:07.5619752Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-d614fbc43046ce78.xml -
2025-12-04T12:36:07.5620097Z =========================== short test summary info ============================
2025-12-04T12:36:07.5620433Z FAILED [8.9156s] distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_backward_reshard_hooks_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:36:07.5620747Z Traceback (most recent call last):
2025-12-04T12:36:07.5621001Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:36:07.5621251Z     getattr(self, test_name)()
2025-12-04T12:36:07.5621487Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:36:07.5621725Z     fn()
2025-12-04T12:36:07.5621929Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5622188Z     method(*args, **kwargs)
2025-12-04T12:36:07.5622411Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5622646Z     method(*args, **kwargs)
2025-12-04T12:36:07.5622869Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:36:07.5623121Z     with policy():
2025-12-04T12:36:07.5623341Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:36:07.5623576Z     raise RuntimeError(msg)
2025-12-04T12:36:07.5623975Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda! Caching allocator allocated memory was 512 and is now reported as 88064 on device 0. CUDA driver allocated memory was 2019557376 and is now 3539992576.
2025-12-04T12:36:07.5624338Z 
2025-12-04T12:36:07.5624414Z To execute this test, run the following from the base repo dir:
2025-12-04T12:36:07.5624739Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda
2025-12-04T12:36:07.5624985Z 
2025-12-04T12:36:07.5625073Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:36:07.5625204Z 
2025-12-04T12:36:07.5625265Z Process 1 exited with error code 10 and exception:
2025-12-04T12:36:07.5625409Z Traceback (most recent call last):
2025-12-04T12:36:07.5625654Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:36:07.5625902Z     getattr(self, test_name)()
2025-12-04T12:36:07.5626139Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:36:07.5626376Z     fn()
2025-12-04T12:36:07.5626584Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5626818Z     method(*args, **kwargs)
2025-12-04T12:36:07.5627040Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5627273Z     method(*args, **kwargs)
2025-12-04T12:36:07.5627532Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:36:07.5627764Z     with policy():
2025-12-04T12:36:07.5627979Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:36:07.5628215Z     raise RuntimeError(msg)
2025-12-04T12:36:07.5628612Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda! Caching allocator allocated memory was 512 and is now reported as 88064 on device 1. CUDA driver allocated memory was 1864368128 and is now 3384803328.
2025-12-04T12:36:07.5628978Z 
2025-12-04T12:36:07.5629053Z To execute this test, run the following from the base repo dir:
2025-12-04T12:36:07.5629375Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_backward_reshard_hooks_cuda
2025-12-04T12:36:07.5629656Z 
2025-12-04T12:36:07.5629747Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:36:07.5629941Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:36:07.5630109Z ======================= 1 failed, 3 deselected in 8.93s ========================
2025-12-04T12:36:07.5630249Z Got exit code 1
2025-12-04T12:36:07.5630468Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_backward_reshard_hooks_cuda
2025-12-04T12:36:07.5630789Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:36:07.5631183Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-a791063d57910165.xml
2025-12-04T12:36:07.5631483Z ============================= test session starts ==============================
2025-12-04T12:36:07.5631713Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:36:07.5631912Z cachedir: .pytest_cache
2025-12-04T12:36:07.5632141Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:36:07.5632384Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T12:36:07.5632509Z configfile: pytest.ini
2025-12-04T12:36:07.5632740Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T12:36:07.5633016Z collecting ... collected 4 items / 1 deselected / 3 selected
2025-12-04T12:36:07.5633179Z stepcurrent: skipping 1 already run items.
2025-12-04T12:36:07.5633314Z Running 3 items in this shard
2025-12-04T12:36:07.5633392Z 
2025-12-04T12:36:07.5633682Z distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_hooks_multi_traversal_cuda I1204 12:34:18.513000 327683 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 327752
2025-12-04T12:36:07.5634168Z I1204 12:34:18.514000 327683 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 327753
2025-12-04T12:36:07.5634858Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T12:36:07.5635445Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T12:36:07.5636052Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T12:36:07.5636637Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T12:36:07.5637026Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T12:36:07.5637397Z   return func(*args, **kwargs)
2025-12-04T12:36:07.5637755Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:36:07.5638117Z   return fsdp_fn(module, **kwargs)
2025-12-04T12:36:07.5638472Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:36:07.5638835Z   return fsdp_fn(module, **kwargs)
2025-12-04T12:36:07.5639192Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_fine_tune.py:246: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:36:07.5639535Z   fsdp_seq = FSDP(
2025-12-04T12:36:07.5639900Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_fine_tune.py:246: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:36:07.5640262Z   fsdp_seq = FSDP(
2025-12-04T12:36:07.5641625Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:36:07.5643044Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:36:07.5644455Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:36:07.5645862Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:36:07.5646171Z [rank1]:E1204 12:34:26.868000 327753 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:36:07.5646543Z [rank1]:E1204 12:34:26.868000 327753 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:36:07.5647034Z [rank1]:E1204 12:34:26.868000 327753 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:36:07.5647516Z [rank1]:E1204 12:34:26.868000 327753 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:36:07.5647997Z [rank1]:E1204 12:34:26.868000 327753 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:36:07.5648446Z [rank1]:E1204 12:34:26.868000 327753 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:36:07.5648897Z [rank1]:E1204 12:34:26.868000 327753 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5649363Z [rank1]:E1204 12:34:26.868000 327753 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:36:07.5649882Z [rank1]:E1204 12:34:26.868000 327753 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5650363Z [rank1]:E1204 12:34:26.868000 327753 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:36:07.5650828Z [rank1]:E1204 12:34:26.868000 327753 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:36:07.5651300Z [rank1]:E1204 12:34:26.868000 327753 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:36:07.5651764Z [rank1]:E1204 12:34:26.868000 327753 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:36:07.5652232Z [rank1]:E1204 12:34:26.868000 327753 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:36:07.5652880Z [rank1]:E1204 12:34:26.868000 327753 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_hooks_multi_traversal_cuda! Caching allocator allocated memory was 512 and is now reported as 30720 on device 1. CUDA driver allocated memory was 1864368128 and is now 3388997632.
2025-12-04T12:36:07.5653485Z [rank1]:E1204 12:34:26.868000 327753 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:36:07.5653837Z [rank1]:E1204 12:34:26.868000 327753 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:36:07.5654415Z [rank1]:E1204 12:34:26.868000 327753 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_hooks_multi_traversal_cuda
2025-12-04T12:36:07.5654905Z [rank1]:E1204 12:34:26.868000 327753 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:36:07.5655273Z [rank1]:E1204 12:34:26.868000 327753 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:36:07.5655723Z [rank1]:E1204 12:34:26.868000 327753 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T12:36:07.5655970Z dist init r=1, world=2
2025-12-04T12:36:07.5656176Z [rank0]:E1204 12:34:26.883000 327752 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:36:07.5656519Z [rank0]:E1204 12:34:26.883000 327752 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:36:07.5657012Z [rank0]:E1204 12:34:26.883000 327752 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:36:07.5657496Z [rank0]:E1204 12:34:26.883000 327752 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:36:07.5657981Z [rank0]:E1204 12:34:26.883000 327752 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:36:07.5658437Z [rank0]:E1204 12:34:26.883000 327752 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:36:07.5658880Z [rank0]:E1204 12:34:26.883000 327752 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5659361Z [rank0]:E1204 12:34:26.883000 327752 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:36:07.5659876Z [rank0]:E1204 12:34:26.883000 327752 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5660359Z [rank0]:E1204 12:34:26.883000 327752 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:36:07.5660827Z [rank0]:E1204 12:34:26.883000 327752 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:36:07.5661282Z [rank0]:E1204 12:34:26.883000 327752 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:36:07.5661746Z [rank0]:E1204 12:34:26.883000 327752 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:36:07.5662214Z [rank0]:E1204 12:34:26.883000 327752 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:36:07.5662857Z [rank0]:E1204 12:34:26.883000 327752 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_hooks_multi_traversal_cuda! Caching allocator allocated memory was 512 and is now reported as 29696 on device 0. CUDA driver allocated memory was 2019557376 and is now 3544186880.
2025-12-04T12:36:07.5663459Z [rank0]:E1204 12:34:26.883000 327752 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:36:07.5663811Z [rank0]:E1204 12:34:26.883000 327752 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:36:07.5664383Z [rank0]:E1204 12:34:26.883000 327752 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_hooks_multi_traversal_cuda
2025-12-04T12:36:07.5664872Z [rank0]:E1204 12:34:26.883000 327752 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:36:07.5665273Z [rank0]:E1204 12:34:26.883000 327752 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:36:07.5665690Z [rank0]:E1204 12:34:26.883000 327752 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T12:36:07.5665937Z dist init r=0, world=2
2025-12-04T12:36:07.5666342Z [rank0]:[W1204 12:34:27.082945502 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T12:36:07.5666755Z FAILED [10.2179s] [ 33%]
2025-12-04T12:36:07.5666822Z 
2025-12-04T12:36:07.5666884Z =================================== FAILURES ===================================
2025-12-04T12:36:07.5667074Z _____________ TestFSDPFineTuneCUDA.test_hooks_multi_traversal_cuda _____________
2025-12-04T12:36:07.5667252Z Traceback (most recent call last):
2025-12-04T12:36:07.5667499Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T12:36:07.5667749Z     self._join_processes(fn)
2025-12-04T12:36:07.5667999Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T12:36:07.5668268Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T12:36:07.5668556Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T12:36:07.5668820Z     raise RuntimeError(error)
2025-12-04T12:36:07.5668976Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T12:36:07.5669143Z Traceback (most recent call last):
2025-12-04T12:36:07.5678218Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:36:07.5678496Z     getattr(self, test_name)()
2025-12-04T12:36:07.5678737Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:36:07.5678972Z     fn()
2025-12-04T12:36:07.5679178Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5679413Z     method(*args, **kwargs)
2025-12-04T12:36:07.5679682Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5679915Z     method(*args, **kwargs)
2025-12-04T12:36:07.5680135Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:36:07.5680362Z     with policy():
2025-12-04T12:36:07.5680579Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:36:07.5680813Z     raise RuntimeError(msg)
2025-12-04T12:36:07.5681210Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_hooks_multi_traversal_cuda! Caching allocator allocated memory was 512 and is now reported as 30720 on device 1. CUDA driver allocated memory was 1864368128 and is now 3388997632.
2025-12-04T12:36:07.5681574Z 
2025-12-04T12:36:07.5681653Z To execute this test, run the following from the base repo dir:
2025-12-04T12:36:07.5681972Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_hooks_multi_traversal_cuda
2025-12-04T12:36:07.5682218Z 
2025-12-04T12:36:07.5682310Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:36:07.5682437Z 
2025-12-04T12:36:07.5682440Z 
2025-12-04T12:36:07.5682523Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:36:07.5682786Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T12:36:07.5683159Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-a791063d57910165.xml -
2025-12-04T12:36:07.5683501Z =========================== short test summary info ============================
2025-12-04T12:36:07.5683835Z FAILED [10.2179s] distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_hooks_multi_traversal_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T12:36:07.5684146Z Traceback (most recent call last):
2025-12-04T12:36:07.5684391Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:36:07.5684638Z     getattr(self, test_name)()
2025-12-04T12:36:07.5684877Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:36:07.5685111Z     fn()
2025-12-04T12:36:07.5685314Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5685546Z     method(*args, **kwargs)
2025-12-04T12:36:07.5685765Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5685994Z     method(*args, **kwargs)
2025-12-04T12:36:07.5686229Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:36:07.5686456Z     with policy():
2025-12-04T12:36:07.5686671Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:36:07.5686904Z     raise RuntimeError(msg)
2025-12-04T12:36:07.5687322Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_hooks_multi_traversal_cuda! Caching allocator allocated memory was 512 and is now reported as 30720 on device 1. CUDA driver allocated memory was 1864368128 and is now 3388997632.
2025-12-04T12:36:07.5687684Z 
2025-12-04T12:36:07.5687760Z To execute this test, run the following from the base repo dir:
2025-12-04T12:36:07.5688083Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_hooks_multi_traversal_cuda
2025-12-04T12:36:07.5688329Z 
2025-12-04T12:36:07.5688416Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:36:07.5688606Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:36:07.5688775Z ======================= 1 failed, 1 deselected in 10.23s =======================
2025-12-04T12:36:07.5688913Z Got exit code 1
2025-12-04T12:36:07.5689015Z Retrying single test...
2025-12-04T12:36:07.5689287Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-d4cf9f3cb356e0a3.xml
2025-12-04T12:36:07.5689636Z ============================= test session starts ==============================
2025-12-04T12:36:07.5689852Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:36:07.5690046Z cachedir: .pytest_cache
2025-12-04T12:36:07.5690271Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:36:07.5690515Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T12:36:07.5690635Z configfile: pytest.ini
2025-12-04T12:36:07.5690865Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T12:36:07.5691135Z collecting ... collected 4 items / 3 deselected / 1 selected
2025-12-04T12:36:07.5691478Z stepcurrent: skipping 1 already run items. Running only test/distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_hooks_multi_traversal_cuda
2025-12-04T12:36:07.5691756Z Running 1 items in this shard
2025-12-04T12:36:07.5691831Z 
2025-12-04T12:36:07.5692126Z distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_hooks_multi_traversal_cuda I1204 12:34:31.236000 327919 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 327988
2025-12-04T12:36:07.5692602Z I1204 12:34:31.237000 327919 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 327989
2025-12-04T12:36:07.5693290Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T12:36:07.5693879Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T12:36:07.5694460Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T12:36:07.5695065Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T12:36:07.5695453Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T12:36:07.5695823Z   return func(*args, **kwargs)
2025-12-04T12:36:07.5696210Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:36:07.5696568Z   return fsdp_fn(module, **kwargs)
2025-12-04T12:36:07.5696923Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:36:07.5697279Z   return fsdp_fn(module, **kwargs)
2025-12-04T12:36:07.5697627Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_fine_tune.py:246: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:36:07.5697970Z   fsdp_seq = FSDP(
2025-12-04T12:36:07.5698290Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_fine_tune.py:246: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:36:07.5698630Z   fsdp_seq = FSDP(
2025-12-04T12:36:07.5700047Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:36:07.5701469Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:36:07.5702878Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:36:07.5704281Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:36:07.5704587Z [rank0]:E1204 12:34:39.756000 327988 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:36:07.5704928Z [rank0]:E1204 12:34:39.756000 327988 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:36:07.5705448Z [rank0]:E1204 12:34:39.756000 327988 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:36:07.5705942Z [rank0]:E1204 12:34:39.756000 327988 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:36:07.5706421Z [rank0]:E1204 12:34:39.756000 327988 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:36:07.5706871Z [rank0]:E1204 12:34:39.756000 327988 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:36:07.5707313Z [rank0]:E1204 12:34:39.756000 327988 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5707778Z [rank0]:E1204 12:34:39.756000 327988 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:36:07.5708245Z [rank0]:E1204 12:34:39.756000 327988 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5708706Z [rank0]:E1204 12:34:39.756000 327988 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:36:07.5709170Z [rank0]:E1204 12:34:39.756000 327988 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:36:07.5709659Z [rank0]:E1204 12:34:39.756000 327988 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:36:07.5710114Z [rank0]:E1204 12:34:39.756000 327988 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:36:07.5710609Z [rank0]:E1204 12:34:39.756000 327988 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:36:07.5711251Z [rank0]:E1204 12:34:39.756000 327988 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_hooks_multi_traversal_cuda! Caching allocator allocated memory was 512 and is now reported as 29696 on device 0. CUDA driver allocated memory was 2019557376 and is now 3544186880.
2025-12-04T12:36:07.5711858Z [rank0]:E1204 12:34:39.756000 327988 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:36:07.5712211Z [rank0]:E1204 12:34:39.756000 327988 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:36:07.5712780Z [rank0]:E1204 12:34:39.756000 327988 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_hooks_multi_traversal_cuda
2025-12-04T12:36:07.5713262Z [rank0]:E1204 12:34:39.756000 327988 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:36:07.5713622Z [rank0]:E1204 12:34:39.756000 327988 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:36:07.5714034Z [rank0]:E1204 12:34:39.756000 327988 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T12:36:07.5714300Z dist init r=0, world=2
2025-12-04T12:36:07.5714502Z [rank1]:E1204 12:34:39.759000 327989 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:36:07.5714838Z [rank1]:E1204 12:34:39.759000 327989 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:36:07.5715340Z [rank1]:E1204 12:34:39.759000 327989 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:36:07.5715821Z [rank1]:E1204 12:34:39.759000 327989 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:36:07.5716298Z [rank1]:E1204 12:34:39.759000 327989 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:36:07.5716744Z [rank1]:E1204 12:34:39.759000 327989 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:36:07.5717180Z [rank1]:E1204 12:34:39.759000 327989 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5717649Z [rank1]:E1204 12:34:39.759000 327989 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:36:07.5718112Z [rank1]:E1204 12:34:39.759000 327989 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5718588Z [rank1]:E1204 12:34:39.759000 327989 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:36:07.5719054Z [rank1]:E1204 12:34:39.759000 327989 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:36:07.5719507Z [rank1]:E1204 12:34:39.759000 327989 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:36:07.5720032Z [rank1]:E1204 12:34:39.759000 327989 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:36:07.5720501Z [rank1]:E1204 12:34:39.759000 327989 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:36:07.5721145Z [rank1]:E1204 12:34:39.759000 327989 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_hooks_multi_traversal_cuda! Caching allocator allocated memory was 512 and is now reported as 29696 on device 1. CUDA driver allocated memory was 1864368128 and is now 3388997632.
2025-12-04T12:36:07.5721748Z [rank1]:E1204 12:34:39.759000 327989 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:36:07.5722096Z [rank1]:E1204 12:34:39.759000 327989 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:36:07.5722668Z [rank1]:E1204 12:34:39.759000 327989 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_hooks_multi_traversal_cuda
2025-12-04T12:36:07.5723147Z [rank1]:E1204 12:34:39.759000 327989 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:36:07.5723512Z [rank1]:E1204 12:34:39.759000 327989 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:36:07.5723940Z [rank1]:E1204 12:34:39.759000 327989 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T12:36:07.5724182Z dist init r=1, world=2
2025-12-04T12:36:07.5724597Z [rank0]:[W1204 12:34:39.936644883 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T12:36:07.5725004Z FAILED [10.3183s] [100%]
2025-12-04T12:36:07.5725071Z 
2025-12-04T12:36:07.5725132Z =================================== FAILURES ===================================
2025-12-04T12:36:07.5725319Z _____________ TestFSDPFineTuneCUDA.test_hooks_multi_traversal_cuda _____________
2025-12-04T12:36:07.5725493Z Traceback (most recent call last):
2025-12-04T12:36:07.5725738Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T12:36:07.5725982Z     self._join_processes(fn)
2025-12-04T12:36:07.5726227Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T12:36:07.5726492Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T12:36:07.5726756Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T12:36:07.5727014Z     raise RuntimeError(error)
2025-12-04T12:36:07.5727166Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:36:07.5727329Z Traceback (most recent call last):
2025-12-04T12:36:07.5727570Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:36:07.5727815Z     getattr(self, test_name)()
2025-12-04T12:36:07.5728045Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:36:07.5728279Z     fn()
2025-12-04T12:36:07.5728484Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5728715Z     method(*args, **kwargs)
2025-12-04T12:36:07.5728961Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5729194Z     method(*args, **kwargs)
2025-12-04T12:36:07.5729409Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:36:07.5729674Z     with policy():
2025-12-04T12:36:07.5729887Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:36:07.5730121Z     raise RuntimeError(msg)
2025-12-04T12:36:07.5730518Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_hooks_multi_traversal_cuda! Caching allocator allocated memory was 512 and is now reported as 29696 on device 0. CUDA driver allocated memory was 2019557376 and is now 3544186880.
2025-12-04T12:36:07.5730880Z 
2025-12-04T12:36:07.5730954Z To execute this test, run the following from the base repo dir:
2025-12-04T12:36:07.5731274Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_hooks_multi_traversal_cuda
2025-12-04T12:36:07.5731513Z 
2025-12-04T12:36:07.5731599Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:36:07.5731724Z 
2025-12-04T12:36:07.5731725Z 
2025-12-04T12:36:07.5731800Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:36:07.5732012Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T12:36:07.5732385Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-d4cf9f3cb356e0a3.xml -
2025-12-04T12:36:07.5732725Z =========================== short test summary info ============================
2025-12-04T12:36:07.5733068Z FAILED [10.3183s] distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_hooks_multi_traversal_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:36:07.5733372Z Traceback (most recent call last):
2025-12-04T12:36:07.5733614Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:36:07.5733853Z     getattr(self, test_name)()
2025-12-04T12:36:07.5734081Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:36:07.5734311Z     fn()
2025-12-04T12:36:07.5734509Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5734735Z     method(*args, **kwargs)
2025-12-04T12:36:07.5734951Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5735177Z     method(*args, **kwargs)
2025-12-04T12:36:07.5735390Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:36:07.5735613Z     with policy():
2025-12-04T12:36:07.5735822Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:36:07.5736049Z     raise RuntimeError(msg)
2025-12-04T12:36:07.5736440Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_hooks_multi_traversal_cuda! Caching allocator allocated memory was 512 and is now reported as 29696 on device 0. CUDA driver allocated memory was 2019557376 and is now 3544186880.
2025-12-04T12:36:07.5736800Z 
2025-12-04T12:36:07.5736873Z To execute this test, run the following from the base repo dir:
2025-12-04T12:36:07.5737192Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_hooks_multi_traversal_cuda
2025-12-04T12:36:07.5737433Z 
2025-12-04T12:36:07.5737552Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:36:07.5737740Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:36:07.5737899Z ======================= 1 failed, 3 deselected in 10.33s =======================
2025-12-04T12:36:07.5738031Z Got exit code 1
2025-12-04T12:36:07.5738122Z Retrying single test...
2025-12-04T12:36:07.5738385Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-e01ef7e7dcdba8b3.xml
2025-12-04T12:36:07.5738680Z ============================= test session starts ==============================
2025-12-04T12:36:07.5738886Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:36:07.5739070Z cachedir: .pytest_cache
2025-12-04T12:36:07.5739295Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:36:07.5739530Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T12:36:07.5739684Z configfile: pytest.ini
2025-12-04T12:36:07.5739906Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T12:36:07.5740171Z collecting ... collected 4 items / 3 deselected / 1 selected
2025-12-04T12:36:07.5740473Z stepcurrent: skipping 1 already run items. Running only test/distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_hooks_multi_traversal_cuda
2025-12-04T12:36:07.5740763Z Running 1 items in this shard
2025-12-04T12:36:07.5740833Z 
2025-12-04T12:36:07.5741122Z distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_hooks_multi_traversal_cuda I1204 12:34:44.057000 328155 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 328224
2025-12-04T12:36:07.5741604Z I1204 12:34:44.058000 328155 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 328225
2025-12-04T12:36:07.5742279Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T12:36:07.5742858Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T12:36:07.5743436Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T12:36:07.5744012Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T12:36:07.5744395Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T12:36:07.5744755Z   return func(*args, **kwargs)
2025-12-04T12:36:07.5745105Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:36:07.5745456Z   return fsdp_fn(module, **kwargs)
2025-12-04T12:36:07.5745807Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:36:07.5746162Z   return fsdp_fn(module, **kwargs)
2025-12-04T12:36:07.5746537Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_fine_tune.py:246: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:36:07.5746876Z   fsdp_seq = FSDP(
2025-12-04T12:36:07.5747198Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_fine_tune.py:246: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:36:07.5747531Z   fsdp_seq = FSDP(
2025-12-04T12:36:07.5748845Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:36:07.5750301Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:36:07.5751710Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:36:07.5753121Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:36:07.5753420Z [rank0]:E1204 12:34:52.567000 328224 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:36:07.5753758Z [rank0]:E1204 12:34:52.567000 328224 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:36:07.5754241Z [rank0]:E1204 12:34:52.567000 328224 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:36:07.5754714Z [rank0]:E1204 12:34:52.567000 328224 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:36:07.5755188Z [rank0]:E1204 12:34:52.567000 328224 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:36:07.5755628Z [rank0]:E1204 12:34:52.567000 328224 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:36:07.5756097Z [rank0]:E1204 12:34:52.567000 328224 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5756554Z [rank0]:E1204 12:34:52.567000 328224 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:36:07.5757016Z [rank0]:E1204 12:34:52.567000 328224 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5757475Z [rank0]:E1204 12:34:52.567000 328224 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:36:07.5757935Z [rank0]:E1204 12:34:52.567000 328224 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:36:07.5758383Z [rank0]:E1204 12:34:52.567000 328224 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:36:07.5758833Z [rank0]:E1204 12:34:52.567000 328224 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:36:07.5759300Z [rank0]:E1204 12:34:52.567000 328224 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:36:07.5759986Z [rank0]:E1204 12:34:52.567000 328224 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_hooks_multi_traversal_cuda! Caching allocator allocated memory was 512 and is now reported as 30720 on device 0. CUDA driver allocated memory was 2019557376 and is now 3544186880.
2025-12-04T12:36:07.5760600Z [rank0]:E1204 12:34:52.567000 328224 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:36:07.5760941Z [rank0]:E1204 12:34:52.567000 328224 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:36:07.5761501Z [rank0]:E1204 12:34:52.567000 328224 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_hooks_multi_traversal_cuda
2025-12-04T12:36:07.5761979Z [rank0]:E1204 12:34:52.567000 328224 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:36:07.5762337Z [rank0]:E1204 12:34:52.567000 328224 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:36:07.5762748Z [rank0]:E1204 12:34:52.567000 328224 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T12:36:07.5762988Z dist init r=0, world=2
2025-12-04T12:36:07.5763185Z [rank1]:E1204 12:34:52.569000 328225 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:36:07.5763514Z [rank1]:E1204 12:34:52.569000 328225 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:36:07.5763992Z [rank1]:E1204 12:34:52.569000 328225 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:36:07.5764465Z [rank1]:E1204 12:34:52.569000 328225 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:36:07.5764960Z [rank1]:E1204 12:34:52.569000 328225 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:36:07.5765403Z [rank1]:E1204 12:34:52.569000 328225 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:36:07.5765835Z [rank1]:E1204 12:34:52.569000 328225 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5766292Z [rank1]:E1204 12:34:52.569000 328225 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:36:07.5766749Z [rank1]:E1204 12:34:52.569000 328225 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5767201Z [rank1]:E1204 12:34:52.569000 328225 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:36:07.5767659Z [rank1]:E1204 12:34:52.569000 328225 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:36:07.5768105Z [rank1]:E1204 12:34:52.569000 328225 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:36:07.5768553Z [rank1]:E1204 12:34:52.569000 328225 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:36:07.5769022Z [rank1]:E1204 12:34:52.569000 328225 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:36:07.5769693Z [rank1]:E1204 12:34:52.569000 328225 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_hooks_multi_traversal_cuda! Caching allocator allocated memory was 512 and is now reported as 30720 on device 1. CUDA driver allocated memory was 1864368128 and is now 3388997632.
2025-12-04T12:36:07.5770298Z [rank1]:E1204 12:34:52.569000 328225 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:36:07.5770642Z [rank1]:E1204 12:34:52.569000 328225 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:36:07.5771205Z [rank1]:E1204 12:34:52.569000 328225 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_hooks_multi_traversal_cuda
2025-12-04T12:36:07.5771680Z [rank1]:E1204 12:34:52.569000 328225 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:36:07.5772044Z [rank1]:E1204 12:34:52.569000 328225 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:36:07.5772454Z [rank1]:E1204 12:34:52.569000 328225 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T12:36:07.5772691Z dist init r=1, world=2
2025-12-04T12:36:07.5773084Z [rank0]:[W1204 12:34:52.753575413 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T12:36:07.5773488Z FAILED [10.3183s] [100%]
2025-12-04T12:36:07.5773551Z 
2025-12-04T12:36:07.5773608Z =================================== FAILURES ===================================
2025-12-04T12:36:07.5773789Z _____________ TestFSDPFineTuneCUDA.test_hooks_multi_traversal_cuda _____________
2025-12-04T12:36:07.5773958Z Traceback (most recent call last):
2025-12-04T12:36:07.5774225Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T12:36:07.5774465Z     self._join_processes(fn)
2025-12-04T12:36:07.5774706Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T12:36:07.5774964Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T12:36:07.5775228Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T12:36:07.5775486Z     raise RuntimeError(error)
2025-12-04T12:36:07.5775631Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:36:07.5775787Z Traceback (most recent call last):
2025-12-04T12:36:07.5776020Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:36:07.5776262Z     getattr(self, test_name)()
2025-12-04T12:36:07.5776490Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:36:07.5776717Z     fn()
2025-12-04T12:36:07.5776915Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5777142Z     method(*args, **kwargs)
2025-12-04T12:36:07.5777360Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5777601Z     method(*args, **kwargs)
2025-12-04T12:36:07.5777815Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:36:07.5778037Z     with policy():
2025-12-04T12:36:07.5778246Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:36:07.5778491Z     raise RuntimeError(msg)
2025-12-04T12:36:07.5778882Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_hooks_multi_traversal_cuda! Caching allocator allocated memory was 512 and is now reported as 30720 on device 0. CUDA driver allocated memory was 2019557376 and is now 3544186880.
2025-12-04T12:36:07.5779241Z 
2025-12-04T12:36:07.5779314Z To execute this test, run the following from the base repo dir:
2025-12-04T12:36:07.5779662Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_hooks_multi_traversal_cuda
2025-12-04T12:36:07.5779903Z 
2025-12-04T12:36:07.5779989Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:36:07.5780113Z 
2025-12-04T12:36:07.5780115Z 
2025-12-04T12:36:07.5780191Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:36:07.5780387Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T12:36:07.5780759Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-e01ef7e7dcdba8b3.xml -
2025-12-04T12:36:07.5781094Z =========================== short test summary info ============================
2025-12-04T12:36:07.5781416Z FAILED [10.3183s] distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_hooks_multi_traversal_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:36:07.5781718Z Traceback (most recent call last):
2025-12-04T12:36:07.5781956Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:36:07.5782200Z     getattr(self, test_name)()
2025-12-04T12:36:07.5782427Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:36:07.5782721Z     fn()
2025-12-04T12:36:07.5782921Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5783147Z     method(*args, **kwargs)
2025-12-04T12:36:07.5783363Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5783591Z     method(*args, **kwargs)
2025-12-04T12:36:07.5783803Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:36:07.5784027Z     with policy():
2025-12-04T12:36:07.5784236Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:36:07.5784465Z     raise RuntimeError(msg)
2025-12-04T12:36:07.5784855Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_hooks_multi_traversal_cuda! Caching allocator allocated memory was 512 and is now reported as 30720 on device 0. CUDA driver allocated memory was 2019557376 and is now 3544186880.
2025-12-04T12:36:07.5785217Z 
2025-12-04T12:36:07.5785291Z To execute this test, run the following from the base repo dir:
2025-12-04T12:36:07.5785602Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_hooks_multi_traversal_cuda
2025-12-04T12:36:07.5785841Z 
2025-12-04T12:36:07.5785947Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:36:07.5786128Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:36:07.5786285Z ======================= 1 failed, 3 deselected in 10.33s =======================
2025-12-04T12:36:07.5786419Z Got exit code 1
2025-12-04T12:36:07.5786627Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_hooks_multi_traversal_cuda
2025-12-04T12:36:07.5786953Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:36:07.5787312Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-9f197ebd66670f53.xml
2025-12-04T12:36:07.5787599Z ============================= test session starts ==============================
2025-12-04T12:36:07.5787803Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:36:07.5787988Z cachedir: .pytest_cache
2025-12-04T12:36:07.5788206Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:36:07.5788439Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T12:36:07.5788551Z configfile: pytest.ini
2025-12-04T12:36:07.5788772Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T12:36:07.5789039Z collecting ... collected 4 items / 2 deselected / 2 selected
2025-12-04T12:36:07.5789193Z stepcurrent: skipping 2 already run items.
2025-12-04T12:36:07.5789318Z Running 2 items in this shard
2025-12-04T12:36:07.5789389Z 
2025-12-04T12:36:07.5789711Z distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_parity_with_ddp_cuda I1204 12:34:56.791000 328391 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 328460
2025-12-04T12:36:07.5790171Z I1204 12:34:56.791000 328391 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 328461
2025-12-04T12:36:07.5790882Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T12:36:07.5791466Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T12:36:07.5792046Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T12:36:07.5792619Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T12:36:07.5793004Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T12:36:07.5793365Z   return func(*args, **kwargs)
2025-12-04T12:36:07.5793715Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:36:07.5794070Z   return fsdp_fn(module, **kwargs)
2025-12-04T12:36:07.5794420Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:36:07.5794787Z   return fsdp_fn(module, **kwargs)
2025-12-04T12:36:07.5795125Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_fine_tune.py:298: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:36:07.5795462Z   fsdp_seq = FSDP(
2025-12-04T12:36:07.5795785Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_fine_tune.py:298: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:36:07.5796131Z   fsdp_seq = FSDP(
2025-12-04T12:36:07.5797444Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:36:07.5798850Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:36:07.5800317Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:36:07.5801711Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:36:07.5802009Z [rank0]:E1204 12:35:03.835000 328460 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:36:07.5802344Z [rank0]:E1204 12:35:03.835000 328460 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:36:07.5802825Z [rank0]:E1204 12:35:03.835000 328460 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:36:07.5803299Z [rank0]:E1204 12:35:03.835000 328460 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:36:07.5803769Z [rank0]:E1204 12:35:03.835000 328460 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:36:07.5804208Z [rank0]:E1204 12:35:03.835000 328460 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:36:07.5804658Z [rank0]:E1204 12:35:03.835000 328460 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5805116Z [rank0]:E1204 12:35:03.835000 328460 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:36:07.5805599Z [rank0]:E1204 12:35:03.835000 328460 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5806056Z [rank0]:E1204 12:35:03.835000 328460 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:36:07.5806510Z [rank0]:E1204 12:35:03.835000 328460 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:36:07.5806958Z [rank0]:E1204 12:35:03.835000 328460 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:36:07.5807407Z [rank0]:E1204 12:35:03.835000 328460 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:36:07.5807866Z [rank0]:E1204 12:35:03.835000 328460 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:36:07.5808493Z [rank0]:E1204 12:35:03.835000 328460 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda! Caching allocator allocated memory was 512 and is now reported as 17408 on device 0. CUDA driver allocated memory was 2019557376 and is now 3525312512.
2025-12-04T12:36:07.5809078Z [rank0]:E1204 12:35:03.835000 328460 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:36:07.5809419Z [rank0]:E1204 12:35:03.835000 328460 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:36:07.5810049Z [rank0]:E1204 12:35:03.835000 328460 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda
2025-12-04T12:36:07.5810520Z [rank0]:E1204 12:35:03.835000 328460 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:36:07.5810878Z [rank0]:E1204 12:35:03.835000 328460 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:36:07.5811290Z [rank0]:E1204 12:35:03.835000 328460 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T12:36:07.5811528Z dist init r=0, world=2
2025-12-04T12:36:07.5811726Z [rank1]:E1204 12:35:03.839000 328461 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:36:07.5812058Z [rank1]:E1204 12:35:03.839000 328461 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:36:07.5812540Z [rank1]:E1204 12:35:03.839000 328461 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:36:07.5813014Z [rank1]:E1204 12:35:03.839000 328461 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:36:07.5813488Z [rank1]:E1204 12:35:03.839000 328461 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:36:07.5813945Z [rank1]:E1204 12:35:03.839000 328461 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:36:07.5814377Z [rank1]:E1204 12:35:03.839000 328461 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5814856Z [rank1]:E1204 12:35:03.839000 328461 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:36:07.5815313Z [rank1]:E1204 12:35:03.839000 328461 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5815767Z [rank1]:E1204 12:35:03.839000 328461 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:36:07.5816226Z [rank1]:E1204 12:35:03.839000 328461 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:36:07.5816673Z [rank1]:E1204 12:35:03.839000 328461 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:36:07.5817127Z [rank1]:E1204 12:35:03.839000 328461 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:36:07.5817590Z [rank1]:E1204 12:35:03.839000 328461 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:36:07.5818223Z [rank1]:E1204 12:35:03.839000 328461 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda! Caching allocator allocated memory was 512 and is now reported as 17408 on device 1. CUDA driver allocated memory was 1864368128 and is now 3370123264.
2025-12-04T12:36:07.5818810Z [rank1]:E1204 12:35:03.839000 328461 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:36:07.5819175Z [rank1]:E1204 12:35:03.839000 328461 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:36:07.5819770Z [rank1]:E1204 12:35:03.839000 328461 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda
2025-12-04T12:36:07.5820233Z [rank1]:E1204 12:35:03.839000 328461 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:36:07.5820592Z [rank1]:E1204 12:35:03.839000 328461 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:36:07.5820999Z [rank1]:E1204 12:35:03.839000 328461 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T12:36:07.5821235Z dist init r=1, world=2
2025-12-04T12:36:07.5821632Z [rank0]:[W1204 12:35:04.005838154 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T12:36:07.5822035Z FAILED [8.9183s] [ 50%]
2025-12-04T12:36:07.5822097Z 
2025-12-04T12:36:07.5822152Z =================================== FAILURES ===================================
2025-12-04T12:36:07.5822331Z ________________ TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda ________________
2025-12-04T12:36:07.5822511Z Traceback (most recent call last):
2025-12-04T12:36:07.5822750Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T12:36:07.5822989Z     self._join_processes(fn)
2025-12-04T12:36:07.5823232Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T12:36:07.5823506Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T12:36:07.5823770Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T12:36:07.5824026Z     raise RuntimeError(error)
2025-12-04T12:36:07.5824171Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:36:07.5824328Z Traceback (most recent call last):
2025-12-04T12:36:07.5824561Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:36:07.5824798Z     getattr(self, test_name)()
2025-12-04T12:36:07.5825027Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:36:07.5825253Z     fn()
2025-12-04T12:36:07.5825452Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5825678Z     method(*args, **kwargs)
2025-12-04T12:36:07.5825896Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5826122Z     method(*args, **kwargs)
2025-12-04T12:36:07.5826337Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:36:07.5826558Z     with policy():
2025-12-04T12:36:07.5826766Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:36:07.5826994Z     raise RuntimeError(msg)
2025-12-04T12:36:07.5827374Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda! Caching allocator allocated memory was 512 and is now reported as 17408 on device 0. CUDA driver allocated memory was 2019557376 and is now 3525312512.
2025-12-04T12:36:07.5827723Z 
2025-12-04T12:36:07.5827795Z To execute this test, run the following from the base repo dir:
2025-12-04T12:36:07.5828132Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda
2025-12-04T12:36:07.5828362Z 
2025-12-04T12:36:07.5828448Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:36:07.5828571Z 
2025-12-04T12:36:07.5828573Z 
2025-12-04T12:36:07.5828648Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:36:07.5828842Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T12:36:07.5829206Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-9f197ebd66670f53.xml -
2025-12-04T12:36:07.5829541Z =========================== short test summary info ============================
2025-12-04T12:36:07.5829884Z FAILED [8.9183s] distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_parity_with_ddp_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:36:07.5830174Z Traceback (most recent call last):
2025-12-04T12:36:07.5830413Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:36:07.5830651Z     getattr(self, test_name)()
2025-12-04T12:36:07.5830880Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:36:07.5831124Z     fn()
2025-12-04T12:36:07.5831320Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5831544Z     method(*args, **kwargs)
2025-12-04T12:36:07.5831760Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5832006Z     method(*args, **kwargs)
2025-12-04T12:36:07.5832221Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:36:07.5832443Z     with policy():
2025-12-04T12:36:07.5832649Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:36:07.5832877Z     raise RuntimeError(msg)
2025-12-04T12:36:07.5833263Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda! Caching allocator allocated memory was 512 and is now reported as 17408 on device 0. CUDA driver allocated memory was 2019557376 and is now 3525312512.
2025-12-04T12:36:07.5833613Z 
2025-12-04T12:36:07.5833686Z To execute this test, run the following from the base repo dir:
2025-12-04T12:36:07.5833990Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda
2025-12-04T12:36:07.5834220Z 
2025-12-04T12:36:07.5834307Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:36:07.5834490Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:36:07.5834649Z ======================= 1 failed, 2 deselected in 8.93s ========================
2025-12-04T12:36:07.5834781Z Got exit code 1
2025-12-04T12:36:07.5834873Z Retrying single test...
2025-12-04T12:36:07.5835134Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-e469e64fd235e843.xml
2025-12-04T12:36:07.5835425Z ============================= test session starts ==============================
2025-12-04T12:36:07.5835630Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:36:07.5835815Z cachedir: .pytest_cache
2025-12-04T12:36:07.5836064Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:36:07.5836299Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T12:36:07.5836413Z configfile: pytest.ini
2025-12-04T12:36:07.5836634Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T12:36:07.5836896Z collecting ... collected 4 items / 3 deselected / 1 selected
2025-12-04T12:36:07.5837188Z stepcurrent: skipping 2 already run items. Running only test/distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_parity_with_ddp_cuda
2025-12-04T12:36:07.5837450Z Running 1 items in this shard
2025-12-04T12:36:07.5837521Z 
2025-12-04T12:36:07.5837797Z distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_parity_with_ddp_cuda I1204 12:35:08.090000 328627 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 328696
2025-12-04T12:36:07.5838260Z I1204 12:35:08.090000 328627 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 328697
2025-12-04T12:36:07.5838938Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T12:36:07.5839514Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T12:36:07.5840141Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T12:36:07.5840725Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T12:36:07.5841109Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T12:36:07.5841476Z   return func(*args, **kwargs)
2025-12-04T12:36:07.5841822Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:36:07.5842177Z   return fsdp_fn(module, **kwargs)
2025-12-04T12:36:07.5842526Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:36:07.5842879Z   return fsdp_fn(module, **kwargs)
2025-12-04T12:36:07.5843218Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_fine_tune.py:298: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:36:07.5843557Z   fsdp_seq = FSDP(
2025-12-04T12:36:07.5843876Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_fine_tune.py:298: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:36:07.5844210Z   fsdp_seq = FSDP(
2025-12-04T12:36:07.5845568Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:36:07.5846970Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:36:07.5848379Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:36:07.5849857Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:36:07.5850153Z [rank0]:E1204 12:35:15.131000 328696 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:36:07.5850505Z [rank0]:E1204 12:35:15.131000 328696 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:36:07.5850988Z [rank0]:E1204 12:35:15.131000 328696 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:36:07.5851459Z [rank0]:E1204 12:35:15.131000 328696 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:36:07.5851930Z [rank0]:E1204 12:35:15.131000 328696 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:36:07.5852372Z [rank0]:E1204 12:35:15.131000 328696 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:36:07.5852808Z [rank0]:E1204 12:35:15.131000 328696 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5853264Z [rank0]:E1204 12:35:15.131000 328696 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:36:07.5853723Z [rank0]:E1204 12:35:15.131000 328696 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5853870Z [rank0]:E1204 12:35:15.131000 328696 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:36:07.5854144Z [rank0]:E1204 12:35:15.131000 328696 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:36:07.5854307Z [rank0]:E1204 12:35:15.131000 328696 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:36:07.5854588Z [rank0]:E1204 12:35:15.131000 328696 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:36:07.5854734Z [rank0]:E1204 12:35:15.131000 328696 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:36:07.5855181Z [rank0]:E1204 12:35:15.131000 328696 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda! Caching allocator allocated memory was 512 and is now reported as 17408 on device 0. CUDA driver allocated memory was 2019557376 and is now 3525312512.
2025-12-04T12:36:07.5855298Z [rank0]:E1204 12:35:15.131000 328696 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:36:07.5855493Z [rank0]:E1204 12:35:15.131000 328696 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:36:07.5855817Z [rank0]:E1204 12:35:15.131000 328696 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda
2025-12-04T12:36:07.5855943Z [rank0]:E1204 12:35:15.131000 328696 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:36:07.5856153Z [rank0]:E1204 12:35:15.131000 328696 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:36:07.5856316Z [rank0]:E1204 12:35:15.131000 328696 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T12:36:07.5856369Z dist init r=0, world=2
2025-12-04T12:36:07.5856506Z [rank1]:E1204 12:35:15.132000 328697 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:36:07.5856666Z [rank1]:E1204 12:35:15.132000 328697 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:36:07.5856954Z [rank1]:E1204 12:35:15.132000 328697 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:36:07.5857107Z [rank1]:E1204 12:35:15.132000 328697 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:36:07.5857393Z [rank1]:E1204 12:35:15.132000 328697 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:36:07.5857517Z [rank1]:E1204 12:35:15.132000 328697 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:36:07.5857792Z [rank1]:E1204 12:35:15.132000 328697 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5857936Z [rank1]:E1204 12:35:15.132000 328697 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:36:07.5858211Z [rank1]:E1204 12:35:15.132000 328697 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5858356Z [rank1]:E1204 12:35:15.132000 328697 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:36:07.5858650Z [rank1]:E1204 12:35:15.132000 328697 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:36:07.5858787Z [rank1]:E1204 12:35:15.132000 328697 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:36:07.5859065Z [rank1]:E1204 12:35:15.132000 328697 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:36:07.5859212Z [rank1]:E1204 12:35:15.132000 328697 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:36:07.5859700Z [rank1]:E1204 12:35:15.132000 328697 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda! Caching allocator allocated memory was 512 and is now reported as 17920 on device 1. CUDA driver allocated memory was 1864368128 and is now 3370123264.
2025-12-04T12:36:07.5859814Z [rank1]:E1204 12:35:15.132000 328697 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:36:07.5860008Z [rank1]:E1204 12:35:15.132000 328697 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:36:07.5860327Z [rank1]:E1204 12:35:15.132000 328697 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda
2025-12-04T12:36:07.5860455Z [rank1]:E1204 12:35:15.132000 328697 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:36:07.5860679Z [rank1]:E1204 12:35:15.132000 328697 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:36:07.5860842Z [rank1]:E1204 12:35:15.132000 328697 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T12:36:07.5860880Z dist init r=1, world=2
2025-12-04T12:36:07.5861216Z [rank0]:[W1204 12:35:15.300778733 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T12:36:07.5861256Z FAILED [8.8153s] [100%]
2025-12-04T12:36:07.5861258Z 
2025-12-04T12:36:07.5861314Z =================================== FAILURES ===================================
2025-12-04T12:36:07.5861402Z ________________ TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda ________________
2025-12-04T12:36:07.5861448Z Traceback (most recent call last):
2025-12-04T12:36:07.5861611Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T12:36:07.5861654Z     self._join_processes(fn)
2025-12-04T12:36:07.5861826Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T12:36:07.5861878Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T12:36:07.5862055Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T12:36:07.5862099Z     raise RuntimeError(error)
2025-12-04T12:36:07.5862176Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:36:07.5862221Z Traceback (most recent call last):
2025-12-04T12:36:07.5862381Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:36:07.5862424Z     getattr(self, test_name)()
2025-12-04T12:36:07.5862604Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:36:07.5862638Z     fn()
2025-12-04T12:36:07.5862790Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5862829Z     method(*args, **kwargs)
2025-12-04T12:36:07.5862980Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5863019Z     method(*args, **kwargs)
2025-12-04T12:36:07.5863169Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:36:07.5863205Z     with policy():
2025-12-04T12:36:07.5863359Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:36:07.5863402Z     raise RuntimeError(msg)
2025-12-04T12:36:07.5863716Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda! Caching allocator allocated memory was 512 and is now reported as 17408 on device 0. CUDA driver allocated memory was 2019557376 and is now 3525312512.
2025-12-04T12:36:07.5863719Z 
2025-12-04T12:36:07.5863795Z To execute this test, run the following from the base repo dir:
2025-12-04T12:36:07.5863993Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda
2025-12-04T12:36:07.5864009Z 
2025-12-04T12:36:07.5864098Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:36:07.5864100Z 
2025-12-04T12:36:07.5864102Z 
2025-12-04T12:36:07.5864176Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:36:07.5864274Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T12:36:07.5864522Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-e469e64fd235e843.xml -
2025-12-04T12:36:07.5864583Z =========================== short test summary info ============================
2025-12-04T12:36:07.5864799Z FAILED [8.8153s] distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_parity_with_ddp_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:36:07.5864845Z Traceback (most recent call last):
2025-12-04T12:36:07.5865010Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:36:07.5865052Z     getattr(self, test_name)()
2025-12-04T12:36:07.5865211Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:36:07.5865246Z     fn()
2025-12-04T12:36:07.5865399Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5865440Z     method(*args, **kwargs)
2025-12-04T12:36:07.5865591Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5865629Z     method(*args, **kwargs)
2025-12-04T12:36:07.5865779Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:36:07.5865816Z     with policy():
2025-12-04T12:36:07.5865969Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:36:07.5866008Z     raise RuntimeError(msg)
2025-12-04T12:36:07.5866352Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda! Caching allocator allocated memory was 512 and is now reported as 17408 on device 0. CUDA driver allocated memory was 2019557376 and is now 3525312512.
2025-12-04T12:36:07.5866356Z 
2025-12-04T12:36:07.5866430Z To execute this test, run the following from the base repo dir:
2025-12-04T12:36:07.5866628Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda
2025-12-04T12:36:07.5866631Z 
2025-12-04T12:36:07.5866717Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:36:07.5866780Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:36:07.5866841Z ======================= 1 failed, 3 deselected in 8.82s ========================
2025-12-04T12:36:07.5866878Z Got exit code 1
2025-12-04T12:36:07.5866919Z Retrying single test...
2025-12-04T12:36:07.5867118Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-9ce64f8147a99674.xml
2025-12-04T12:36:07.5867179Z ============================= test session starts ==============================
2025-12-04T12:36:07.5867290Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:36:07.5867331Z cachedir: .pytest_cache
2025-12-04T12:36:07.5867487Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:36:07.5867533Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T12:36:07.5867585Z configfile: pytest.ini
2025-12-04T12:36:07.5867753Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T12:36:07.5867823Z collecting ... collected 4 items / 3 deselected / 1 selected
2025-12-04T12:36:07.5868016Z stepcurrent: skipping 2 already run items. Running only test/distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_parity_with_ddp_cuda
2025-12-04T12:36:07.5868071Z Running 1 items in this shard
2025-12-04T12:36:07.5868073Z 
2025-12-04T12:36:07.5868351Z distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_parity_with_ddp_cuda I1204 12:35:19.306000 328863 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 328932
2025-12-04T12:36:07.5868508Z I1204 12:35:19.306000 328863 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 328933
2025-12-04T12:36:07.5869002Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T12:36:07.5869067Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T12:36:07.5869556Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T12:36:07.5869641Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T12:36:07.5869932Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T12:36:07.5869976Z   return func(*args, **kwargs)
2025-12-04T12:36:07.5870260Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:36:07.5870331Z   return fsdp_fn(module, **kwargs)
2025-12-04T12:36:07.5870609Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:36:07.5870651Z   return fsdp_fn(module, **kwargs)
2025-12-04T12:36:07.5870919Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_fine_tune.py:298: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:36:07.5870957Z   fsdp_seq = FSDP(
2025-12-04T12:36:07.5871219Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_fsdp_fine_tune.py:298: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T12:36:07.5871256Z   fsdp_seq = FSDP(
2025-12-04T12:36:07.5872519Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:36:07.5872676Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:36:07.5873928Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:36:07.5874049Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:36:07.5874193Z [rank0]:E1204 12:35:26.315000 328932 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:36:07.5874353Z [rank0]:E1204 12:35:26.315000 328932 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:36:07.5874643Z [rank0]:E1204 12:35:26.315000 328932 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:36:07.5874797Z [rank0]:E1204 12:35:26.315000 328932 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:36:07.5875107Z [rank0]:E1204 12:35:26.315000 328932 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:36:07.5875233Z [rank0]:E1204 12:35:26.315000 328932 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:36:07.5875507Z [rank0]:E1204 12:35:26.315000 328932 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5875655Z [rank0]:E1204 12:35:26.315000 328932 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:36:07.5875930Z [rank0]:E1204 12:35:26.315000 328932 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5876079Z [rank0]:E1204 12:35:26.315000 328932 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:36:07.5876353Z [rank0]:E1204 12:35:26.315000 328932 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:36:07.5876489Z [rank0]:E1204 12:35:26.315000 328932 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:36:07.5876780Z [rank0]:E1204 12:35:26.315000 328932 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:36:07.5876927Z [rank0]:E1204 12:35:26.315000 328932 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:36:07.5877384Z [rank0]:E1204 12:35:26.315000 328932 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda! Caching allocator allocated memory was 512 and is now reported as 17920 on device 0. CUDA driver allocated memory was 2019557376 and is now 3525312512.
2025-12-04T12:36:07.5877498Z [rank0]:E1204 12:35:26.315000 328932 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:36:07.5877693Z [rank0]:E1204 12:35:26.315000 328932 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:36:07.5878015Z [rank0]:E1204 12:35:26.315000 328932 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda
2025-12-04T12:36:07.5878129Z [rank0]:E1204 12:35:26.315000 328932 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:36:07.5878338Z [rank0]:E1204 12:35:26.315000 328932 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:36:07.5878501Z [rank0]:E1204 12:35:26.315000 328932 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T12:36:07.5878540Z dist init r=0, world=2
2025-12-04T12:36:07.5878676Z [rank1]:E1204 12:35:26.321000 328933 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:36:07.5878836Z [rank1]:E1204 12:35:26.321000 328933 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:36:07.5879120Z [rank1]:E1204 12:35:26.321000 328933 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:36:07.5879294Z [rank1]:E1204 12:35:26.321000 328933 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:36:07.5879614Z [rank1]:E1204 12:35:26.321000 328933 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:36:07.5879737Z [rank1]:E1204 12:35:26.321000 328933 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:36:07.5880015Z [rank1]:E1204 12:35:26.321000 328933 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5880161Z [rank1]:E1204 12:35:26.321000 328933 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:36:07.5880438Z [rank1]:E1204 12:35:26.321000 328933 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5880582Z [rank1]:E1204 12:35:26.321000 328933 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:36:07.5880857Z [rank1]:E1204 12:35:26.321000 328933 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:36:07.5881012Z [rank1]:E1204 12:35:26.321000 328933 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:36:07.5881286Z [rank1]:E1204 12:35:26.321000 328933 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:36:07.5881451Z [rank1]:E1204 12:35:26.321000 328933 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:36:07.5881890Z [rank1]:E1204 12:35:26.321000 328933 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda! Caching allocator allocated memory was 512 and is now reported as 17408 on device 1. CUDA driver allocated memory was 1864368128 and is now 3370123264.
2025-12-04T12:36:07.5882006Z [rank1]:E1204 12:35:26.321000 328933 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:36:07.5882198Z [rank1]:E1204 12:35:26.321000 328933 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:36:07.5882524Z [rank1]:E1204 12:35:26.321000 328933 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda
2025-12-04T12:36:07.5882636Z [rank1]:E1204 12:35:26.321000 328933 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:36:07.5882844Z [rank1]:E1204 12:35:26.321000 328933 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:36:07.5883009Z [rank1]:E1204 12:35:26.321000 328933 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T12:36:07.5883046Z dist init r=1, world=2
2025-12-04T12:36:07.5883380Z [rank0]:[W1204 12:35:26.473645116 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T12:36:07.5883453Z FAILED [8.8173s] [100%]
2025-12-04T12:36:07.5883455Z 
2025-12-04T12:36:07.5883512Z =================================== FAILURES ===================================
2025-12-04T12:36:07.5883600Z ________________ TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda ________________
2025-12-04T12:36:07.5883648Z Traceback (most recent call last):
2025-12-04T12:36:07.5883809Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T12:36:07.5883855Z     self._join_processes(fn)
2025-12-04T12:36:07.5884026Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T12:36:07.5884081Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T12:36:07.5884259Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T12:36:07.5884304Z     raise RuntimeError(error)
2025-12-04T12:36:07.5884384Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:36:07.5884427Z Traceback (most recent call last):
2025-12-04T12:36:07.5884588Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:36:07.5884629Z     getattr(self, test_name)()
2025-12-04T12:36:07.5884787Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:36:07.5884835Z     fn()
2025-12-04T12:36:07.5884987Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5885026Z     method(*args, **kwargs)
2025-12-04T12:36:07.5885177Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5885227Z     method(*args, **kwargs)
2025-12-04T12:36:07.5885378Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:36:07.5885415Z     with policy():
2025-12-04T12:36:07.5885567Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:36:07.5885607Z     raise RuntimeError(msg)
2025-12-04T12:36:07.5885922Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda! Caching allocator allocated memory was 512 and is now reported as 17920 on device 0. CUDA driver allocated memory was 2019557376 and is now 3525312512.
2025-12-04T12:36:07.5885925Z 
2025-12-04T12:36:07.5885999Z To execute this test, run the following from the base repo dir:
2025-12-04T12:36:07.5886197Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda
2025-12-04T12:36:07.5886201Z 
2025-12-04T12:36:07.5886289Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:36:07.5886291Z 
2025-12-04T12:36:07.5886348Z Process 1 exited with error code 10 and exception:
2025-12-04T12:36:07.5886394Z Traceback (most recent call last):
2025-12-04T12:36:07.5886555Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:36:07.5886598Z     getattr(self, test_name)()
2025-12-04T12:36:07.5886755Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:36:07.5886790Z     fn()
2025-12-04T12:36:07.5886942Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5886981Z     method(*args, **kwargs)
2025-12-04T12:36:07.5887152Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5887193Z     method(*args, **kwargs)
2025-12-04T12:36:07.5887341Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:36:07.5887379Z     with policy():
2025-12-04T12:36:07.5887531Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:36:07.5887573Z     raise RuntimeError(msg)
2025-12-04T12:36:07.5887887Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda! Caching allocator allocated memory was 512 and is now reported as 17408 on device 1. CUDA driver allocated memory was 1864368128 and is now 3370123264.
2025-12-04T12:36:07.5887889Z 
2025-12-04T12:36:07.5887961Z To execute this test, run the following from the base repo dir:
2025-12-04T12:36:07.5888158Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda
2025-12-04T12:36:07.5888161Z 
2025-12-04T12:36:07.5888246Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:36:07.5888248Z 
2025-12-04T12:36:07.5888250Z 
2025-12-04T12:36:07.5888326Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:36:07.5888411Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T12:36:07.5888675Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-9ce64f8147a99674.xml -
2025-12-04T12:36:07.5888734Z =========================== short test summary info ============================
2025-12-04T12:36:07.5888949Z FAILED [8.8173s] distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_parity_with_ddp_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:36:07.5889005Z Traceback (most recent call last):
2025-12-04T12:36:07.5889171Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:36:07.5889211Z     getattr(self, test_name)()
2025-12-04T12:36:07.5889370Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:36:07.5889404Z     fn()
2025-12-04T12:36:07.5889555Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5889674Z     method(*args, **kwargs)
2025-12-04T12:36:07.5889828Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5889869Z     method(*args, **kwargs)
2025-12-04T12:36:07.5890021Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:36:07.5890059Z     with policy():
2025-12-04T12:36:07.5890209Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:36:07.5890251Z     raise RuntimeError(msg)
2025-12-04T12:36:07.5890566Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda! Caching allocator allocated memory was 512 and is now reported as 17920 on device 0. CUDA driver allocated memory was 2019557376 and is now 3525312512.
2025-12-04T12:36:07.5890569Z 
2025-12-04T12:36:07.5890644Z To execute this test, run the following from the base repo dir:
2025-12-04T12:36:07.5890841Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda
2025-12-04T12:36:07.5890844Z 
2025-12-04T12:36:07.5890959Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:36:07.5890961Z 
2025-12-04T12:36:07.5891020Z Process 1 exited with error code 10 and exception:
2025-12-04T12:36:07.5891066Z Traceback (most recent call last):
2025-12-04T12:36:07.5891227Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:36:07.5891269Z     getattr(self, test_name)()
2025-12-04T12:36:07.5891426Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:36:07.5891461Z     fn()
2025-12-04T12:36:07.5891612Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5891650Z     method(*args, **kwargs)
2025-12-04T12:36:07.5891800Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5891840Z     method(*args, **kwargs)
2025-12-04T12:36:07.5891990Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:36:07.5892027Z     with policy():
2025-12-04T12:36:07.5892179Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:36:07.5892219Z     raise RuntimeError(msg)
2025-12-04T12:36:07.5892532Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda! Caching allocator allocated memory was 512 and is now reported as 17408 on device 1. CUDA driver allocated memory was 1864368128 and is now 3370123264.
2025-12-04T12:36:07.5892549Z 
2025-12-04T12:36:07.5892622Z To execute this test, run the following from the base repo dir:
2025-12-04T12:36:07.5892817Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_parity_with_ddp_cuda
2025-12-04T12:36:07.5892833Z 
2025-12-04T12:36:07.5892919Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:36:07.5892986Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:36:07.5893049Z ======================= 1 failed, 3 deselected in 8.83s ========================
2025-12-04T12:36:07.5893086Z Got exit code 1
2025-12-04T12:36:07.5893235Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_parity_with_ddp_cuda
2025-12-04T12:36:07.5893362Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:36:07.5893561Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-0015e4972511b8d3.xml
2025-12-04T12:36:07.5893618Z ============================= test session starts ==============================
2025-12-04T12:36:07.5893734Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:36:07.5893774Z cachedir: .pytest_cache
2025-12-04T12:36:07.5893933Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:36:07.5893978Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T12:36:07.5894019Z configfile: pytest.ini
2025-12-04T12:36:07.5894181Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T12:36:07.5894254Z collecting ... collected 4 items / 3 deselected / 1 selected
2025-12-04T12:36:07.5894306Z stepcurrent: skipping 3 already run items.
2025-12-04T12:36:07.5894350Z Running 1 items in this shard
2025-12-04T12:36:07.5894352Z 
2025-12-04T12:36:07.5894666Z distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_parity_with_non_frozen_fsdp_cuda I1204 12:35:30.535000 329099 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 329168
2025-12-04T12:36:07.5894822Z I1204 12:35:30.536000 329099 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 329169
2025-12-04T12:36:07.5895319Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T12:36:07.5895381Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T12:36:07.5895870Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T12:36:07.5895929Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T12:36:07.5896223Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T12:36:07.5896266Z   return func(*args, **kwargs)
2025-12-04T12:36:07.5896421Z [rank1]:E1204 12:35:38.382000 329169 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:36:07.5896581Z [rank1]:E1204 12:35:38.382000 329169 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:36:07.5896871Z [rank1]:E1204 12:35:38.382000 329169 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:36:07.5897043Z [rank1]:E1204 12:35:38.382000 329169 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:36:07.5897325Z [rank1]:E1204 12:35:38.382000 329169 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:36:07.5897451Z [rank1]:E1204 12:35:38.382000 329169 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:36:07.5897731Z [rank1]:E1204 12:35:38.382000 329169 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5897881Z [rank1]:E1204 12:35:38.382000 329169 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:36:07.5898155Z [rank1]:E1204 12:35:38.382000 329169 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5898302Z [rank1]:E1204 12:35:38.382000 329169 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:36:07.5898575Z [rank1]:E1204 12:35:38.382000 329169 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:36:07.5898711Z [rank1]:E1204 12:35:38.382000 329169 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:36:07.5899007Z [rank1]:E1204 12:35:38.382000 329169 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:36:07.5899154Z [rank1]:E1204 12:35:38.382000 329169 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:36:07.5899646Z [rank1]:E1204 12:35:38.382000 329169 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_parity_with_non_frozen_fsdp_cuda! Caching allocator allocated memory was 512 and is now reported as 35328 on device 1. CUDA driver allocated memory was 1864368128 and is now 3340763136.
2025-12-04T12:36:07.5899765Z [rank1]:E1204 12:35:38.382000 329169 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:36:07.5899960Z [rank1]:E1204 12:35:38.382000 329169 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:36:07.5900305Z [rank1]:E1204 12:35:38.382000 329169 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_parity_with_non_frozen_fsdp_cuda
2025-12-04T12:36:07.5900418Z [rank1]:E1204 12:35:38.382000 329169 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:36:07.5900628Z [rank1]:E1204 12:35:38.382000 329169 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:36:07.5900807Z [rank1]:E1204 12:35:38.382000 329169 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T12:36:07.5900847Z dist init r=1, world=2
2025-12-04T12:36:07.5900983Z [rank0]:E1204 12:35:38.428000 329168 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:36:07.5901157Z [rank0]:E1204 12:35:38.428000 329168 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:36:07.5901442Z [rank0]:E1204 12:35:38.428000 329168 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:36:07.5901593Z [rank0]:E1204 12:35:38.428000 329168 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:36:07.5901877Z [rank0]:E1204 12:35:38.428000 329168 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:36:07.5902000Z [rank0]:E1204 12:35:38.428000 329168 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:36:07.5902281Z [rank0]:E1204 12:35:38.428000 329168 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5902427Z [rank0]:E1204 12:35:38.428000 329168 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:36:07.5902703Z [rank0]:E1204 12:35:38.428000 329168 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5902849Z [rank0]:E1204 12:35:38.428000 329168 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:36:07.5903124Z [rank0]:E1204 12:35:38.428000 329168 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:36:07.5903283Z [rank0]:E1204 12:35:38.428000 329168 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:36:07.5903559Z [rank0]:E1204 12:35:38.428000 329168 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:36:07.5903707Z [rank0]:E1204 12:35:38.428000 329168 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:36:07.5904167Z [rank0]:E1204 12:35:38.428000 329168 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_parity_with_non_frozen_fsdp_cuda! Caching allocator allocated memory was 512 and is now reported as 35328 on device 0. CUDA driver allocated memory was 2019557376 and is now 3495952384.
2025-12-04T12:36:07.5904285Z [rank0]:E1204 12:35:38.428000 329168 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:36:07.5904480Z [rank0]:E1204 12:35:38.428000 329168 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:36:07.5904820Z [rank0]:E1204 12:35:38.428000 329168 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_parity_with_non_frozen_fsdp_cuda
2025-12-04T12:36:07.5904942Z [rank0]:E1204 12:35:38.428000 329168 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:36:07.5905151Z [rank0]:E1204 12:35:38.428000 329168 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:36:07.5905327Z [rank0]:E1204 12:35:38.428000 329168 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T12:36:07.5905365Z dist init r=0, world=2
2025-12-04T12:36:07.5905700Z [rank0]:[W1204 12:35:38.710265873 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T12:36:07.5905738Z FAILED [9.6167s] [100%]
2025-12-04T12:36:07.5905740Z 
2025-12-04T12:36:07.5905796Z =================================== FAILURES ===================================
2025-12-04T12:36:07.5905890Z __________ TestFSDPFineTuneCUDA.test_parity_with_non_frozen_fsdp_cuda __________
2025-12-04T12:36:07.5905937Z Traceback (most recent call last):
2025-12-04T12:36:07.5906099Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T12:36:07.5906143Z     self._join_processes(fn)
2025-12-04T12:36:07.5906319Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T12:36:07.5906372Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T12:36:07.5906550Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T12:36:07.5906592Z     raise RuntimeError(error)
2025-12-04T12:36:07.5906671Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T12:36:07.5906717Z Traceback (most recent call last):
2025-12-04T12:36:07.5906878Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:36:07.5906920Z     getattr(self, test_name)()
2025-12-04T12:36:07.5907078Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:36:07.5907114Z     fn()
2025-12-04T12:36:07.5907284Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5907324Z     method(*args, **kwargs)
2025-12-04T12:36:07.5907475Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5907514Z     method(*args, **kwargs)
2025-12-04T12:36:07.5907665Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:36:07.5907703Z     with policy():
2025-12-04T12:36:07.5907856Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:36:07.5907897Z     raise RuntimeError(msg)
2025-12-04T12:36:07.5908231Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_parity_with_non_frozen_fsdp_cuda! Caching allocator allocated memory was 512 and is now reported as 35328 on device 1. CUDA driver allocated memory was 1864368128 and is now 3340763136.
2025-12-04T12:36:07.5908235Z 
2025-12-04T12:36:07.5908309Z To execute this test, run the following from the base repo dir:
2025-12-04T12:36:07.5908524Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_parity_with_non_frozen_fsdp_cuda
2025-12-04T12:36:07.5908526Z 
2025-12-04T12:36:07.5908614Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:36:07.5908626Z 
2025-12-04T12:36:07.5908628Z 
2025-12-04T12:36:07.5908701Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:36:07.5908789Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T12:36:07.5909034Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-0015e4972511b8d3.xml -
2025-12-04T12:36:07.5909106Z =========================== short test summary info ============================
2025-12-04T12:36:07.5909334Z FAILED [9.6167s] distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_parity_with_non_frozen_fsdp_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T12:36:07.5909380Z Traceback (most recent call last):
2025-12-04T12:36:07.5909544Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:36:07.5909628Z     getattr(self, test_name)()
2025-12-04T12:36:07.5909787Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:36:07.5909821Z     fn()
2025-12-04T12:36:07.5909972Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5910012Z     method(*args, **kwargs)
2025-12-04T12:36:07.5910165Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5910204Z     method(*args, **kwargs)
2025-12-04T12:36:07.5910355Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:36:07.5910391Z     with policy():
2025-12-04T12:36:07.5910543Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:36:07.5910583Z     raise RuntimeError(msg)
2025-12-04T12:36:07.5910913Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_parity_with_non_frozen_fsdp_cuda! Caching allocator allocated memory was 512 and is now reported as 35328 on device 1. CUDA driver allocated memory was 1864368128 and is now 3340763136.
2025-12-04T12:36:07.5910917Z 
2025-12-04T12:36:07.5911013Z To execute this test, run the following from the base repo dir:
2025-12-04T12:36:07.5911231Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_parity_with_non_frozen_fsdp_cuda
2025-12-04T12:36:07.5911233Z 
2025-12-04T12:36:07.5911319Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:36:07.5911382Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:36:07.5911444Z ======================= 1 failed, 3 deselected in 9.63s ========================
2025-12-04T12:36:07.5911484Z Got exit code 1
2025-12-04T12:36:07.5911525Z Retrying single test...
2025-12-04T12:36:07.5911727Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-0e0648c6f1ca5aaf.xml
2025-12-04T12:36:07.5911785Z ============================= test session starts ==============================
2025-12-04T12:36:07.5911898Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:36:07.5911941Z cachedir: .pytest_cache
2025-12-04T12:36:07.5912098Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:36:07.5912145Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T12:36:07.5912185Z configfile: pytest.ini
2025-12-04T12:36:07.5912347Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T12:36:07.5912436Z collecting ... collected 4 items / 3 deselected / 1 selected
2025-12-04T12:36:07.5912646Z stepcurrent: skipping 3 already run items. Running only test/distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_parity_with_non_frozen_fsdp_cuda
2025-12-04T12:36:07.5912689Z Running 1 items in this shard
2025-12-04T12:36:07.5912703Z 
2025-12-04T12:36:07.5912995Z distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_parity_with_non_frozen_fsdp_cuda I1204 12:35:42.687000 329335 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 329404
2025-12-04T12:36:07.5913148Z I1204 12:35:42.688000 329335 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 329405
2025-12-04T12:36:07.5913644Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T12:36:07.5916312Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T12:36:07.5916816Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T12:36:07.5916879Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T12:36:07.5917172Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T12:36:07.5917219Z   return func(*args, **kwargs)
2025-12-04T12:36:07.5917363Z [rank0]:E1204 12:35:50.629000 329404 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:36:07.5917527Z [rank0]:E1204 12:35:50.629000 329404 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:36:07.5917854Z [rank0]:E1204 12:35:50.629000 329404 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:36:07.5918010Z [rank0]:E1204 12:35:50.629000 329404 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:36:07.5918296Z [rank0]:E1204 12:35:50.629000 329404 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:36:07.5918422Z [rank0]:E1204 12:35:50.629000 329404 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:36:07.5918699Z [rank0]:E1204 12:35:50.629000 329404 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5918850Z [rank0]:E1204 12:35:50.629000 329404 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:36:07.5919127Z [rank0]:E1204 12:35:50.629000 329404 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5919274Z [rank0]:E1204 12:35:50.629000 329404 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:36:07.5919563Z [rank0]:E1204 12:35:50.629000 329404 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:36:07.5919741Z [rank0]:E1204 12:35:50.629000 329404 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:36:07.5920042Z [rank0]:E1204 12:35:50.629000 329404 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:36:07.5920192Z [rank0]:E1204 12:35:50.629000 329404 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:36:07.5920651Z [rank0]:E1204 12:35:50.629000 329404 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_parity_with_non_frozen_fsdp_cuda! Caching allocator allocated memory was 512 and is now reported as 35328 on device 0. CUDA driver allocated memory was 2019557376 and is now 3495952384.
2025-12-04T12:36:07.5920769Z [rank0]:E1204 12:35:50.629000 329404 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:36:07.5920964Z [rank0]:E1204 12:35:50.629000 329404 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:36:07.5921310Z [rank0]:E1204 12:35:50.629000 329404 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_parity_with_non_frozen_fsdp_cuda
2025-12-04T12:36:07.5921424Z [rank0]:E1204 12:35:50.629000 329404 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:36:07.5921637Z [rank0]:E1204 12:35:50.629000 329404 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:36:07.5921802Z [rank0]:E1204 12:35:50.629000 329404 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T12:36:07.5921842Z dist init r=0, world=2
2025-12-04T12:36:07.5922006Z [rank1]:E1204 12:35:50.636000 329405 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:36:07.5922165Z [rank1]:E1204 12:35:50.636000 329405 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:36:07.5922452Z [rank1]:E1204 12:35:50.636000 329405 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:36:07.5922605Z [rank1]:E1204 12:35:50.636000 329405 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:36:07.5922889Z [rank1]:E1204 12:35:50.636000 329405 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:36:07.5923016Z [rank1]:E1204 12:35:50.636000 329405 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:36:07.5923297Z [rank1]:E1204 12:35:50.636000 329405 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5923443Z [rank1]:E1204 12:35:50.636000 329405 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:36:07.5923721Z [rank1]:E1204 12:35:50.636000 329405 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5923881Z [rank1]:E1204 12:35:50.636000 329405 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:36:07.5924157Z [rank1]:E1204 12:35:50.636000 329405 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:36:07.5924304Z [rank1]:E1204 12:35:50.636000 329405 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:36:07.5924582Z [rank1]:E1204 12:35:50.636000 329405 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:36:07.5924730Z [rank1]:E1204 12:35:50.636000 329405 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:36:07.5925186Z [rank1]:E1204 12:35:50.636000 329405 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_parity_with_non_frozen_fsdp_cuda! Caching allocator allocated memory was 512 and is now reported as 35328 on device 1. CUDA driver allocated memory was 1864368128 and is now 3340763136.
2025-12-04T12:36:07.5925299Z [rank1]:E1204 12:35:50.636000 329405 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:36:07.5925493Z [rank1]:E1204 12:35:50.636000 329405 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:36:07.5925831Z [rank1]:E1204 12:35:50.636000 329405 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_parity_with_non_frozen_fsdp_cuda
2025-12-04T12:36:07.5925944Z [rank1]:E1204 12:35:50.636000 329405 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:36:07.5926153Z [rank1]:E1204 12:35:50.636000 329405 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:36:07.5926336Z [rank1]:E1204 12:35:50.636000 329405 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T12:36:07.5926376Z dist init r=1, world=2
2025-12-04T12:36:07.5926714Z [rank0]:[W1204 12:35:50.800091060 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T12:36:07.5926755Z FAILED [9.7170s] [100%]
2025-12-04T12:36:07.5926757Z 
2025-12-04T12:36:07.5926811Z =================================== FAILURES ===================================
2025-12-04T12:36:07.5926907Z __________ TestFSDPFineTuneCUDA.test_parity_with_non_frozen_fsdp_cuda __________
2025-12-04T12:36:07.5926953Z Traceback (most recent call last):
2025-12-04T12:36:07.5927118Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T12:36:07.5927163Z     self._join_processes(fn)
2025-12-04T12:36:07.5927336Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T12:36:07.5927388Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T12:36:07.5927567Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T12:36:07.5927622Z     raise RuntimeError(error)
2025-12-04T12:36:07.5927701Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:36:07.5927745Z Traceback (most recent call last):
2025-12-04T12:36:07.5927906Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:36:07.5927947Z     getattr(self, test_name)()
2025-12-04T12:36:07.5928117Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:36:07.5928151Z     fn()
2025-12-04T12:36:07.5928304Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5928346Z     method(*args, **kwargs)
2025-12-04T12:36:07.5928496Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5928536Z     method(*args, **kwargs)
2025-12-04T12:36:07.5928686Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:36:07.5928723Z     with policy():
2025-12-04T12:36:07.5928876Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:36:07.5928918Z     raise RuntimeError(msg)
2025-12-04T12:36:07.5929251Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_parity_with_non_frozen_fsdp_cuda! Caching allocator allocated memory was 512 and is now reported as 35328 on device 0. CUDA driver allocated memory was 2019557376 and is now 3495952384.
2025-12-04T12:36:07.5929254Z 
2025-12-04T12:36:07.5929329Z To execute this test, run the following from the base repo dir:
2025-12-04T12:36:07.5929544Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_parity_with_non_frozen_fsdp_cuda
2025-12-04T12:36:07.5929548Z 
2025-12-04T12:36:07.5929678Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:36:07.5929680Z 
2025-12-04T12:36:07.5929682Z 
2025-12-04T12:36:07.5929759Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:36:07.5929845Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T12:36:07.5930124Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-0e0648c6f1ca5aaf.xml -
2025-12-04T12:36:07.5930184Z =========================== short test summary info ============================
2025-12-04T12:36:07.5930418Z FAILED [9.7170s] distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_parity_with_non_frozen_fsdp_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:36:07.5930462Z Traceback (most recent call last):
2025-12-04T12:36:07.5930627Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:36:07.5930669Z     getattr(self, test_name)()
2025-12-04T12:36:07.5930830Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:36:07.5930864Z     fn()
2025-12-04T12:36:07.5931019Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5931058Z     method(*args, **kwargs)
2025-12-04T12:36:07.5931208Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5931246Z     method(*args, **kwargs)
2025-12-04T12:36:07.5931398Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:36:07.5931449Z     with policy():
2025-12-04T12:36:07.5931602Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:36:07.5931641Z     raise RuntimeError(msg)
2025-12-04T12:36:07.5931972Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_parity_with_non_frozen_fsdp_cuda! Caching allocator allocated memory was 512 and is now reported as 35328 on device 0. CUDA driver allocated memory was 2019557376 and is now 3495952384.
2025-12-04T12:36:07.5931989Z 
2025-12-04T12:36:07.5932064Z To execute this test, run the following from the base repo dir:
2025-12-04T12:36:07.5932278Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_parity_with_non_frozen_fsdp_cuda
2025-12-04T12:36:07.5932280Z 
2025-12-04T12:36:07.5932365Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:36:07.5932427Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:36:07.5932490Z ======================= 1 failed, 3 deselected in 9.73s ========================
2025-12-04T12:36:07.5932527Z Got exit code 1
2025-12-04T12:36:07.5932566Z Retrying single test...
2025-12-04T12:36:07.5932770Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-5ade9bc37ba34ad9.xml
2025-12-04T12:36:07.5932831Z ============================= test session starts ==============================
2025-12-04T12:36:07.5932943Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:36:07.5932984Z cachedir: .pytest_cache
2025-12-04T12:36:07.5933140Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:36:07.5933187Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T12:36:07.5933227Z configfile: pytest.ini
2025-12-04T12:36:07.5933393Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T12:36:07.5933463Z collecting ... collected 4 items / 3 deselected / 1 selected
2025-12-04T12:36:07.5933673Z stepcurrent: skipping 3 already run items. Running only test/distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_parity_with_non_frozen_fsdp_cuda
2025-12-04T12:36:07.5933719Z Running 1 items in this shard
2025-12-04T12:36:07.5933741Z 
2025-12-04T12:36:07.5934029Z distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_parity_with_non_frozen_fsdp_cuda I1204 12:35:54.799000 329571 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 329640
2025-12-04T12:36:07.5934182Z I1204 12:35:54.800000 329571 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 329641
2025-12-04T12:36:07.5934684Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T12:36:07.5934747Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T12:36:07.5935232Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T12:36:07.5935293Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T12:36:07.5935588Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T12:36:07.5935642Z   return func(*args, **kwargs)
2025-12-04T12:36:07.5935786Z [rank1]:E1204 12:36:02.720000 329641 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:36:07.5935959Z [rank1]:E1204 12:36:02.720000 329641 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:36:07.5936248Z [rank1]:E1204 12:36:02.720000 329641 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:36:07.5936403Z [rank1]:E1204 12:36:02.720000 329641 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:36:07.5936689Z [rank1]:E1204 12:36:02.720000 329641 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:36:07.5936814Z [rank1]:E1204 12:36:02.720000 329641 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:36:07.5937094Z [rank1]:E1204 12:36:02.720000 329641 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5937242Z [rank1]:E1204 12:36:02.720000 329641 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:36:07.5937517Z [rank1]:E1204 12:36:02.720000 329641 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5937666Z [rank1]:E1204 12:36:02.720000 329641 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:36:07.5937943Z [rank1]:E1204 12:36:02.720000 329641 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:36:07.5938100Z [rank1]:E1204 12:36:02.720000 329641 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:36:07.5938377Z [rank1]:E1204 12:36:02.720000 329641 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:36:07.5938524Z [rank1]:E1204 12:36:02.720000 329641 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:36:07.5938980Z [rank1]:E1204 12:36:02.720000 329641 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_parity_with_non_frozen_fsdp_cuda! Caching allocator allocated memory was 512 and is now reported as 35328 on device 1. CUDA driver allocated memory was 1864368128 and is now 3340763136.
2025-12-04T12:36:07.5939095Z [rank1]:E1204 12:36:02.720000 329641 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:36:07.5939293Z [rank1]:E1204 12:36:02.720000 329641 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:36:07.5939670Z [rank1]:E1204 12:36:02.720000 329641 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_parity_with_non_frozen_fsdp_cuda
2025-12-04T12:36:07.5939797Z [rank1]:E1204 12:36:02.720000 329641 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:36:07.5940007Z [rank1]:E1204 12:36:02.720000 329641 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:36:07.5940171Z [rank1]:E1204 12:36:02.720000 329641 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T12:36:07.5940228Z dist init r=1, world=2
2025-12-04T12:36:07.5940363Z [rank0]:E1204 12:36:02.781000 329640 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:36:07.5940522Z [rank0]:E1204 12:36:02.781000 329640 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:36:07.5940808Z [rank0]:E1204 12:36:02.781000 329640 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:36:07.5940962Z [rank0]:E1204 12:36:02.781000 329640 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:36:07.5941242Z [rank0]:E1204 12:36:02.781000 329640 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:36:07.5941367Z [rank0]:E1204 12:36:02.781000 329640 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:36:07.5941646Z [rank0]:E1204 12:36:02.781000 329640 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5941791Z [rank0]:E1204 12:36:02.781000 329640 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:36:07.5942066Z [rank0]:E1204 12:36:02.781000 329640 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5942211Z [rank0]:E1204 12:36:02.781000 329640 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:36:07.5942514Z [rank0]:E1204 12:36:02.781000 329640 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:36:07.5942650Z [rank0]:E1204 12:36:02.781000 329640 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:36:07.5942926Z [rank0]:E1204 12:36:02.781000 329640 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:36:07.5943073Z [rank0]:E1204 12:36:02.781000 329640 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:36:07.5943527Z [rank0]:E1204 12:36:02.781000 329640 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_parity_with_non_frozen_fsdp_cuda! Caching allocator allocated memory was 512 and is now reported as 35328 on device 0. CUDA driver allocated memory was 2019557376 and is now 3495952384.
2025-12-04T12:36:07.5943644Z [rank0]:E1204 12:36:02.781000 329640 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:36:07.5943839Z [rank0]:E1204 12:36:02.781000 329640 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:36:07.5944189Z [rank0]:E1204 12:36:02.781000 329640 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_parity_with_non_frozen_fsdp_cuda
2025-12-04T12:36:07.5944301Z [rank0]:E1204 12:36:02.781000 329640 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:36:07.5944524Z [rank0]:E1204 12:36:02.781000 329640 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:36:07.5944689Z [rank0]:E1204 12:36:02.781000 329640 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T12:36:07.5944727Z dist init r=0, world=2
2025-12-04T12:36:07.5945062Z [rank0]:[W1204 12:36:03.056022483 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T12:36:07.5945101Z FAILED [9.7173s] [100%]
2025-12-04T12:36:07.5945103Z 
2025-12-04T12:36:07.5945159Z =================================== FAILURES ===================================
2025-12-04T12:36:07.5945251Z __________ TestFSDPFineTuneCUDA.test_parity_with_non_frozen_fsdp_cuda __________
2025-12-04T12:36:07.5945299Z Traceback (most recent call last):
2025-12-04T12:36:07.5945463Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T12:36:07.5945507Z     self._join_processes(fn)
2025-12-04T12:36:07.5945678Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T12:36:07.5945732Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T12:36:07.5945910Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T12:36:07.5945955Z     raise RuntimeError(error)
2025-12-04T12:36:07.5946032Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T12:36:07.5946077Z Traceback (most recent call last):
2025-12-04T12:36:07.5946238Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:36:07.5946311Z     getattr(self, test_name)()
2025-12-04T12:36:07.5946469Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:36:07.5946504Z     fn()
2025-12-04T12:36:07.5946655Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5946696Z     method(*args, **kwargs)
2025-12-04T12:36:07.5946847Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5946887Z     method(*args, **kwargs)
2025-12-04T12:36:07.5947039Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:36:07.5947075Z     with policy():
2025-12-04T12:36:07.5947227Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:36:07.5947269Z     raise RuntimeError(msg)
2025-12-04T12:36:07.5947597Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_parity_with_non_frozen_fsdp_cuda! Caching allocator allocated memory was 512 and is now reported as 35328 on device 1. CUDA driver allocated memory was 1864368128 and is now 3340763136.
2025-12-04T12:36:07.5947599Z 
2025-12-04T12:36:07.5947672Z To execute this test, run the following from the base repo dir:
2025-12-04T12:36:07.5947886Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_parity_with_non_frozen_fsdp_cuda
2025-12-04T12:36:07.5947898Z 
2025-12-04T12:36:07.5947984Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:36:07.5947986Z 
2025-12-04T12:36:07.5947989Z 
2025-12-04T12:36:07.5948063Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:36:07.5948161Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T12:36:07.5948409Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-5ade9bc37ba34ad9.xml -
2025-12-04T12:36:07.5948469Z =========================== short test summary info ============================
2025-12-04T12:36:07.5948698Z FAILED [9.7173s] distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_parity_with_non_frozen_fsdp_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T12:36:07.5948744Z Traceback (most recent call last):
2025-12-04T12:36:07.5948908Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:36:07.5948951Z     getattr(self, test_name)()
2025-12-04T12:36:07.5949112Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:36:07.5949149Z     fn()
2025-12-04T12:36:07.5949300Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5949340Z     method(*args, **kwargs)
2025-12-04T12:36:07.5949489Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:36:07.5949529Z     method(*args, **kwargs)
2025-12-04T12:36:07.5949710Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:36:07.5949748Z     with policy():
2025-12-04T12:36:07.5949899Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:36:07.5949940Z     raise RuntimeError(msg)
2025-12-04T12:36:07.5950301Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPFineTuneCUDA.test_parity_with_non_frozen_fsdp_cuda! Caching allocator allocated memory was 512 and is now reported as 35328 on device 1. CUDA driver allocated memory was 1864368128 and is now 3340763136.
2025-12-04T12:36:07.5950306Z 
2025-12-04T12:36:07.5950378Z To execute this test, run the following from the base repo dir:
2025-12-04T12:36:07.5950592Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_fine_tune.py TestFSDPFineTuneCUDA.test_parity_with_non_frozen_fsdp_cuda
2025-12-04T12:36:07.5950594Z 
2025-12-04T12:36:07.5950682Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:36:07.5950744Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:36:07.5950804Z ======================= 1 failed, 3 deselected in 9.73s ========================
2025-12-04T12:36:07.5950841Z Got exit code 1
2025-12-04T12:36:07.5951005Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_parity_with_non_frozen_fsdp_cuda
2025-12-04T12:36:07.5951135Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:36:07.5951338Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-43708990400729c0.xml
2025-12-04T12:36:07.5951395Z ============================= test session starts ==============================
2025-12-04T12:36:07.5951508Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:36:07.5951562Z cachedir: .pytest_cache
2025-12-04T12:36:07.5951720Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:36:07.5951765Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T12:36:07.5951806Z configfile: pytest.ini
2025-12-04T12:36:07.5951982Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T12:36:07.5952054Z collecting ... collected 4 items / 4 deselected / 0 selected
2025-12-04T12:36:07.5952106Z stepcurrent: skipping 4 already run items.
2025-12-04T12:36:07.5952148Z Running 0 items in this shard
2025-12-04T12:36:07.5952150Z 
2025-12-04T12:36:07.5952391Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fine_tune/distributed.fsdp.test_fsdp_fine_tune-43708990400729c0.xml -
2025-12-04T12:36:07.5952450Z ============================ 4 deselected in 0.00s =============================
2025-12-04T12:36:07.5953052Z The following tests failed consistently: ['test/distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_backward_reshard_hooks_cuda', 'test/distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_hooks_multi_traversal_cuda', 'test/distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_parity_with_ddp_cuda', 'test/distributed/fsdp/test_fsdp_fine_tune.py::TestFSDPFineTuneCUDA::test_parity_with_non_frozen_fsdp_cuda']
2025-12-04T12:36:07.5953055Z 
2025-12-04T12:36:07.5953246Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_fine_tune 1/1 (test/test-reports/distributed.fsdp.test_fsdp_fine_tune_1.1_aed87725c804591d_.log)
2025-12-04T12:36:07.5953248Z 
2025-12-04T12:36:07.5953376Z Finished distributed/fsdp/test_fsdp_fine_tune 1/1 ... [2025-12-04 12:36:07.546114][5229808.525152349], took 2.41min
2025-12-04T12:36:07.5953641Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T12:36:07.5953729Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T12:36:07.5953821Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading
2025-12-04T12:36:07.5953871Z Uploading artifacts took 0.00 seconds
2025-12-04T12:36:07.5953928Z distributed/fsdp/test_fsdp_fine_tune 1/1 failed!
2025-12-04T12:36:07.5954061Z Running distributed/test_multi_threaded_pg 1/1 ... [2025-12-04 12:36:07.549081][5229808.52812229]
2025-12-04T12:36:07.5954110Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T12:36:07.5954425Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_multi_threaded_pg.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:36:07.549264]
2025-12-04T12:36:10.0172331Z 
2025-12-04T12:36:10.0172708Z distributed/test_multi_threaded_pg 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_multi_threaded_pg_1.1_063d4919acdf1ad9_.log
2025-12-04T12:36:10.0176490Z Running 22 items in this shard: test/distributed/test_multi_threaded_pg.py::TestCollectivesWithWrapper::test_all_to_all_single_list, test/distributed/test_multi_threaded_pg.py::TestCollectivesWithWrapper::test_all_to_all_single_none, test/distributed/test_multi_threaded_pg.py::TestCollectivesWithWrapper::test_all_to_all_single_tensor, test/distributed/test_multi_threaded_pg.py::TestCollectivesWithWrapper::test_broadcast_object_list, test/distributed/test_multi_threaded_pg.py::TestCollectivesWithWrapper::test_collective_error_on_rank_non_zero, test/distributed/test_multi_threaded_pg.py::TestCollectivesWithWrapper::test_collective_error_on_rank_non_zero_all, test/distributed/test_multi_threaded_pg.py::TestCollectivesWithWrapper::test_collective_error_on_rank_zero, test/distributed/test_multi_threaded_pg.py::TestCollectivesWithWrapper::test_skip, test/distributed/test_multi_threaded_pg.py::TestCollectivesWithBaseClass::test_all_reduce, test/distributed/test_multi_threaded_pg.py::TestCollectivesWithBaseClass::test_all_reduce_coalesced, test/distributed/test_multi_threaded_pg.py::TestCollectivesWithBaseClass::test_all_reduce_ops, test/distributed/test_multi_threaded_pg.py::TestCollectivesWithBaseClass::test_all_to_all, test/distributed/test_multi_threaded_pg.py::TestCollectivesWithBaseClass::test_allgather, test/distributed/test_multi_threaded_pg.py::TestCollectivesWithBaseClass::test_assert_equal_on_rank, test/distributed/test_multi_threaded_pg.py::TestCollectivesWithBaseClass::test_broadcast, test/distributed/test_multi_threaded_pg.py::TestCollectivesWithBaseClass::test_broadcast_object_list, test/distributed/test_multi_threaded_pg.py::TestCollectivesWithBaseClass::test_bwd_sees_fwd_pg, test/distributed/test_multi_threaded_pg.py::TestCollectivesWithBaseClass::test_gather, test/distributed/test_multi_threaded_pg.py::TestCollectivesWithBaseClass::test_reduce_scatter, test/distributed/test_multi_threaded_pg.py::TestCollectivesWithBaseClass::test_scatter, test/distributed/test_multi_threaded_pg.py::TestCollectivesWithBaseClass::test_subpg, test/distributed/test_multi_threaded_pg.py::TestCollectivesWithBaseClass::test_using_pg_from_another_thread
2025-12-04T12:36:10.0180481Z 
2025-12-04T12:36:10.0180630Z Finished distributed/test_multi_threaded_pg 1/1 ... [2025-12-04 12:36:10.016880][5229810.995918539], took 0.04min
2025-12-04T12:36:10.0184483Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T12:36:10.0196221Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T12:36:10.0197714Z Running distributed/_composable/fsdp/test_fully_shard_extensions 1/1 ... [2025-12-04 12:36:10.019629][5229810.998670123]
2025-12-04T12:36:10.0198305Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T12:36:10.0199947Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_composable/fsdp/test_fully_shard_extensions.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:36:10.019822]
2025-12-04T12:36:36.1719324Z 
2025-12-04T12:36:36.1729770Z distributed/_composable/fsdp/test_fully_shard_extensions 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._composable.fsdp.test_fully_shard_extensions_1.1_d99f22c17891004c_.log
2025-12-04T12:36:36.1733254Z Running 5 items in this shard: test/distributed/_composable/fsdp/test_fully_shard_extensions.py::TestFullyShardAllGatherExtensionsMultiProcess::test_all_gather_extensions_train_parity, test/distributed/_composable/fsdp/test_fully_shard_extensions.py::TestFullyShardAllGatherExtensionsMultiThread::test_all_gather_extension_hsdp_mesh, test/distributed/_composable/fsdp/test_fully_shard_extensions.py::TestFullyShardAllGatherExtensionsMultiThread::test_all_gather_extension_outer_size_stride, test/distributed/_composable/fsdp/test_fully_shard_extensions.py::TestFullyShardAllGatherExtensionsMultiThread::test_all_gather_extensions_end_to_end, test/distributed/_composable/fsdp/test_fully_shard_extensions.py::TestFullyShardAllGatherExtensionsMultiThread::test_all_gather_extensions_monkey_patch
2025-12-04T12:36:36.1735426Z 
2025-12-04T12:36:36.1735696Z Finished distributed/_composable/fsdp/test_fully_shard_extensions 1/1 ... [2025-12-04 12:36:36.171610][5229837.150646935], took 0.44min
2025-12-04T12:36:36.1739732Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T12:36:36.1752401Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T12:36:36.1753028Z Running distributed/checkpoint/test_file_system_checkpoint_cpu 1/1 ... [2025-12-04 12:36:36.175167][5229837.154208038]
2025-12-04T12:36:36.1754654Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T12:36:36.1755214Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/checkpoint/test_file_system_checkpoint_cpu.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:36:36.175364]
2025-12-04T12:36:58.5729909Z 
2025-12-04T12:36:58.5731054Z distributed/checkpoint/test_file_system_checkpoint_cpu 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint.test_file_system_checkpoint_cpu_1.1_1a588f4f72c9e6bd_.log
2025-12-04T12:36:58.5739757Z Running 16 items in this shard: test/distributed/checkpoint/test_file_system_checkpoint_cpu.py::TestDistributedStateDictSaveLoad::test_read_write_only_tensor_thread_count_1, test/distributed/checkpoint/test_file_system_checkpoint_cpu.py::TestDistributedStateDictSaveLoad::test_read_write_only_tensor_thread_count_2, test/distributed/checkpoint/test_file_system_checkpoint_cpu.py::TestDistributedStateDictSaveLoadRot13::test_read_write_tensor_and_blob_thread_count_1, test/distributed/checkpoint/test_file_system_checkpoint_cpu.py::TestDistributedStateDictSaveLoadRot13::test_read_write_tensor_and_blob_thread_count_2, test/distributed/checkpoint/test_file_system_checkpoint_cpu.py::TestDistributedStateDictSaveLoadZStandard::test_read_write_only_tensor_thread_count_1, test/distributed/checkpoint/test_file_system_checkpoint_cpu.py::TestDistributedStateDictSaveLoadZStandard::test_read_write_only_tensor_thread_count_2, test/distributed/checkpoint/test_file_system_checkpoint_cpu.py::TestDistributedStateDictSaveLoadWithSharedTensor::test_read_write_shard_tensor_thread_count_1, test/distributed/checkpoint/test_file_system_checkpoint_cpu.py::TestDistributedStateDictSaveLoadWithSharedTensor::test_read_write_shard_tensor_thread_count_2, test/distributed/checkpoint/test_file_system_checkpoint_cpu.py::TestDistributedReshardOnLoad::test_load_rowwise_to_colwise_thread_count_1, test/distributed/checkpoint/test_file_system_checkpoint_cpu.py::TestDistributedReshardOnLoad::test_load_rowwise_to_colwise_thread_count_2, test/distributed/checkpoint/test_file_system_checkpoint_cpu.py::TestDistributedReshardOnLoad::test_load_with_different_shard_plan_thread_count_1, test/distributed/checkpoint/test_file_system_checkpoint_cpu.py::TestDistributedReshardOnLoad::test_load_with_different_shard_plan_thread_count_2, test/distributed/checkpoint/test_file_system_checkpoint_cpu.py::TestDistributedReshardOnLoad::test_save_load_bytes_thread_count_1, test/distributed/checkpoint/test_file_system_checkpoint_cpu.py::TestDistributedReshardOnLoad::test_save_load_bytes_thread_count_2, test/distributed/checkpoint/test_file_system_checkpoint_cpu.py::TestDistributedReshardOnLoad::test_switch_between_sharded_tensor_to_tensor_thread_count_1, test/distributed/checkpoint/test_file_system_checkpoint_cpu.py::TestDistributedReshardOnLoad::test_switch_between_sharded_tensor_to_tensor_thread_count_2
2025-12-04T12:36:58.5746390Z 
2025-12-04T12:36:58.5746636Z Finished distributed/checkpoint/test_file_system_checkpoint_cpu 1/1 ... [2025-12-04 12:36:58.572753][5229859.551788427], took 0.37min
2025-12-04T12:36:58.5750170Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T12:36:58.5760684Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T12:36:58.5762695Z Running distributed/fsdp/test_wrap 1/1 ... [2025-12-04 12:36:58.576185][5229859.555225602]
2025-12-04T12:36:58.5762925Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T12:36:58.5765121Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/fsdp/test_wrap.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:36:58.576403]
2025-12-04T12:39:14.6569886Z 
2025-12-04T12:39:14.6570996Z distributed/fsdp/test_wrap 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.fsdp.test_wrap_1.1_3aac12bd02055555_.log
2025-12-04T12:39:14.6588618Z Running 52 items in this shard: test/distributed/fsdp/test_wrap.py::TestFSDPWrap::test_bn_always_wrapped_individually, test/distributed/fsdp/test_wrap.py::TestFSDPWrap::test_error_already_wrapped_nested_False_device_init_mode0, test/distributed/fsdp/test_wrap.py::TestFSDPWrap::test_error_already_wrapped_nested_False_device_init_mode1, test/distributed/fsdp/test_wrap.py::TestFSDPWrap::test_error_already_wrapped_nested_True_device_init_mode0, test/distributed/fsdp/test_wrap.py::TestFSDPWrap::test_error_already_wrapped_nested_True_device_init_mode1, test/distributed/fsdp/test_wrap.py::TestFSDPWrap::test_main_wrap_api_cpu_offload0_backward_prefetch0_forward_prefetch_False_device_init_mode0, test/distributed/fsdp/test_wrap.py::TestFSDPWrap::test_main_wrap_api_cpu_offload0_backward_prefetch0_forward_prefetch_False_device_init_mode1, test/distributed/fsdp/test_wrap.py::TestFSDPWrap::test_main_wrap_api_cpu_offload0_backward_prefetch0_forward_prefetch_True_device_init_mode0, test/distributed/fsdp/test_wrap.py::TestFSDPWrap::test_main_wrap_api_cpu_offload0_backward_prefetch0_forward_prefetch_True_device_init_mode1, test/distributed/fsdp/test_wrap.py::TestFSDPWrap::test_main_wrap_api_cpu_offload0_backward_prefetch1_forward_prefetch_False_device_init_mode0, test/distributed/fsdp/test_wrap.py::TestFSDPWrap::test_main_wrap_api_cpu_offload0_backward_prefetch1_forward_prefetch_False_device_init_mode1, test/distributed/fsdp/test_wrap.py::TestFSDPWrap::test_main_wrap_api_cpu_offload0_backward_prefetch1_forward_prefetch_True_device_init_mode0, test/distributed/fsdp/test_wrap.py::TestFSDPWrap::test_main_wrap_api_cpu_offload0_backward_prefetch1_forward_prefetch_True_device_init_mode1, test/distributed/fsdp/test_wrap.py::TestFSDPWrap::test_main_wrap_api_cpu_offload1_backward_prefetch0_forward_prefetch_False_device_init_mode0, test/distributed/fsdp/test_wrap.py::TestFSDPWrap::test_main_wrap_api_cpu_offload1_backward_prefetch0_forward_prefetch_False_device_init_mode1, test/distributed/fsdp/test_wrap.py::TestFSDPWrap::test_main_wrap_api_cpu_offload1_backward_prefetch0_forward_prefetch_True_device_init_mode0, test/distributed/fsdp/test_wrap.py::TestFSDPWrap::test_main_wrap_api_cpu_offload1_backward_prefetch0_forward_prefetch_True_device_init_mode1, test/distributed/fsdp/test_wrap.py::TestFSDPWrap::test_main_wrap_api_cpu_offload1_backward_prefetch1_forward_prefetch_False_device_init_mode0, test/distributed/fsdp/test_wrap.py::TestFSDPWrap::test_main_wrap_api_cpu_offload1_backward_prefetch1_forward_prefetch_False_device_init_mode1, test/distributed/fsdp/test_wrap.py::TestFSDPWrap::test_main_wrap_api_cpu_offload1_backward_prefetch1_forward_prefetch_True_device_init_mode0, test/distributed/fsdp/test_wrap.py::TestFSDPWrap::test_main_wrap_api_cpu_offload1_backward_prefetch1_forward_prefetch_True_device_init_mode1, test/distributed/fsdp/test_wrap.py::TestFSDPWrap::test_wrap_batchnorm_individually_use_or_policy_False, test/distributed/fsdp/test_wrap.py::TestFSDPWrap::test_wrap_batchnorm_individually_use_or_policy_True, test/distributed/fsdp/test_wrap.py::TestFSDPWrap::test_zero_argument, test/distributed/fsdp/test_wrap.py::TestAutoWrap::test_always_wrap, test/distributed/fsdp/test_wrap.py::TestAutoWrap::test_always_wrap_with_ignored_modules_wrap_method0, test/distributed/fsdp/test_wrap.py::TestAutoWrap::test_always_wrap_with_ignored_modules_wrap_method1, test/distributed/fsdp/test_wrap.py::TestAutoWrap::test_auto_wrap_api, test/distributed/fsdp/test_wrap.py::TestAutoWrap::test_auto_wrap_preset_exclude_wrap, test/distributed/fsdp/test_wrap.py::TestAutoWrap::test_auto_wrap_preset_exclude_wrap_include_children, test/distributed/fsdp/test_wrap.py::TestAutoWrap::test_auto_wrap_preset_force_leaf, test/distributed/fsdp/test_wrap.py::TestAutoWrap::test_auto_wrap_preset_force_leaf_custom, test/distributed/fsdp/test_wrap.py::TestAutoWrap::test_auto_wrap_smoke_test_device_init_mode0_cpu_offload0_use_device_id_False, test/distributed/fsdp/test_wrap.py::TestAutoWrap::test_auto_wrap_smoke_test_device_init_mode0_cpu_offload0_use_device_id_True, test/distributed/fsdp/test_wrap.py::TestAutoWrap::test_auto_wrap_smoke_test_device_init_mode0_cpu_offload1_use_device_id_False, test/distributed/fsdp/test_wrap.py::TestAutoWrap::test_auto_wrap_smoke_test_device_init_mode0_cpu_offload1_use_device_id_True, test/distributed/fsdp/test_wrap.py::TestAutoWrap::test_auto_wrap_smoke_test_device_init_mode1_cpu_offload0_use_device_id_False, test/distributed/fsdp/test_wrap.py::TestAutoWrap::test_auto_wrap_smoke_test_device_init_mode1_cpu_offload0_use_device_id_True, test/distributed/fsdp/test_wrap.py::TestAutoWrap::test_auto_wrap_smoke_test_device_init_mode1_cpu_offload1_use_device_id_False, test/distributed/fsdp/test_wrap.py::TestAutoWrap::test_auto_wrap_smoke_test_device_init_mode1_cpu_offload1_use_device_id_True, test/distributed/fsdp/test_wrap.py::TestAutoWrap::test_auto_wrap_with_ignored_modules_wrap_method0, test/distributed/fsdp/test_wrap.py::TestAutoWrap::test_auto_wrap_with_ignored_modules_wrap_method1, test/distributed/fsdp/test_wrap.py::TestAutoWrap::test_custom_policy, test/distributed/fsdp/test_wrap.py::TestAutoWrap::test_frozen_params, test/distributed/fsdp/test_wrap.py::TestAutoWrap::test_module_wrap_policy, test/distributed/fsdp/test_wrap.py::TestAutoWrap::test_module_wrap_policy_callable, test/distributed/fsdp/test_wrap.py::TestAutoWrap::test_transformer_auto_wrap_policy, test/distributed/fsdp/test_wrap.py::TestAutoWrap::test_wrap_disabled_outside_context, test/distributed/fsdp/test_wrap.py::TestAutoWrap::test_wrap_override_defaults, test/distributed/fsdp/test_wrap.py::TestAutoWrap::test_wrap_wrap_method0, test/distributed/fsdp/test_wrap.py::TestAutoWrap::test_wrap_wrap_method1, test/distributed/fsdp/test_wrap.py::TestWrapUtils::test_validate_frozen_params
2025-12-04T12:39:14.6598511Z 
2025-12-04T12:39:14.6598656Z Finished distributed/fsdp/test_wrap 1/1 ... [2025-12-04 12:39:14.656643][5229995.635680802], took 2.27min
2025-12-04T12:39:14.6599162Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T12:39:14.6599704Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T12:39:14.6600001Z Running distributed/fsdp/test_hsdp_dtensor_state_dict 1/1 ... [2025-12-04 12:39:14.659390][5229995.638431366]
2025-12-04T12:39:14.6600248Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T12:39:14.6600729Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/fsdp/test_hsdp_dtensor_state_dict.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:39:14.659612]
2025-12-04T12:44:29.0254745Z 
2025-12-04T12:44:29.0255851Z PRINTING LOG FILE of distributed/fsdp/test_hsdp_dtensor_state_dict 1/1 (test/test-reports/distributed.fsdp.test_hsdp_dtensor_state_dict_1.1_60de516b7e1e2204_.log)
2025-12-04T12:44:29.0256912Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-cc2b69bb2c652278.xml
2025-12-04T12:44:29.0257685Z ============================= test session starts ==============================
2025-12-04T12:44:29.0258170Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:44:29.0258579Z cachedir: .pytest_cache
2025-12-04T12:44:29.0259128Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:44:29.0259829Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T12:44:29.0260783Z configfile: pytest.ini
2025-12-04T12:44:29.0261301Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T12:44:29.0261820Z collecting ... collected 8 items
2025-12-04T12:44:29.0262105Z stepcurrent: Cannot find last run test, not skipping
2025-12-04T12:44:29.0265650Z Running 8 items in this shard: test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda, test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda, test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda, test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda, test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda, test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda, test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_hsdp_init_with_device_mesh_cuda, test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_root_module_is_not_FSDP_cuda
2025-12-04T12:44:29.0269919Z 
2025-12-04T12:44:29.0270648Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda I1204 12:39:16.427000 339794 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 339863
2025-12-04T12:44:29.0271644Z I1204 12:39:16.428000 339794 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 339864
2025-12-04T12:44:29.0272236Z I1204 12:39:16.429000 339794 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 339865
2025-12-04T12:44:29.0272829Z I1204 12:39:16.429000 339794 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 339866
2025-12-04T12:44:29.0274604Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.0275594Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.0276568Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.0277541Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.0278543Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.0279569Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.0280582Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.0281567Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.0283313Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:44:29.0284831Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:44:29.0286330Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:44:29.0287768Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:44:29.0289289Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:44:29.0290805Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:44:29.0292242Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:44:29.0293704Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:44:29.0294002Z E1204 12:39:25.182000 339863 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.0294337Z E1204 12:39:25.182000 339863 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.0294819Z E1204 12:39:25.182000 339863 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0295290Z E1204 12:39:25.182000 339863 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.0295757Z E1204 12:39:25.182000 339863 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0296192Z E1204 12:39:25.182000 339863 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.0296657Z E1204 12:39:25.182000 339863 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0297111Z E1204 12:39:25.182000 339863 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0297565Z E1204 12:39:25.182000 339863 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0298015Z E1204 12:39:25.182000 339863 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0298467Z E1204 12:39:25.182000 339863 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0298907Z E1204 12:39:25.182000 339863 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.0299350Z E1204 12:39:25.182000 339863 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0299842Z E1204 12:39:25.182000 339863 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.0300547Z E1204 12:39:25.182000 339863 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3242196992.
2025-12-04T12:44:29.0301231Z E1204 12:39:25.182000 339863 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0301588Z E1204 12:39:25.182000 339863 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0302243Z E1204 12:39:25.182000 339863 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T12:44:29.0302812Z E1204 12:39:25.182000 339863 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0303169Z E1204 12:39:25.182000 339863 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0303570Z E1204 12:39:25.182000 339863 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T12:44:29.0303904Z E1204 12:39:25.193000 339866 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.0304229Z E1204 12:39:25.193000 339866 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.0304704Z E1204 12:39:25.193000 339866 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0305170Z E1204 12:39:25.193000 339866 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.0305636Z E1204 12:39:25.193000 339866 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0306072Z E1204 12:39:25.193000 339866 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.0306552Z E1204 12:39:25.193000 339866 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0307005Z E1204 12:39:25.193000 339866 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0307459Z E1204 12:39:25.193000 339866 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0307909Z E1204 12:39:25.193000 339866 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0308362Z E1204 12:39:25.193000 339866 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0308806Z E1204 12:39:25.193000 339866 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.0309249Z E1204 12:39:25.193000 339866 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0309753Z E1204 12:39:25.193000 339866 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.0310474Z E1204 12:39:25.193000 339866 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 1105199104 and is now 3076521984.
2025-12-04T12:44:29.0311156Z E1204 12:39:25.193000 339866 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0311496Z E1204 12:39:25.193000 339866 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0312153Z E1204 12:39:25.193000 339866 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T12:44:29.0312721Z E1204 12:39:25.193000 339866 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0313074Z E1204 12:39:25.193000 339866 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0313479Z E1204 12:39:25.193000 339866 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T12:44:29.0313808Z E1204 12:39:25.210000 339865 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.0314149Z E1204 12:39:25.210000 339865 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.0314623Z E1204 12:39:25.210000 339865 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0315090Z E1204 12:39:25.210000 339865 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.0315591Z E1204 12:39:25.210000 339865 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0316022Z E1204 12:39:25.210000 339865 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.0316444Z E1204 12:39:25.210000 339865 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0316889Z E1204 12:39:25.210000 339865 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0317339Z E1204 12:39:25.210000 339865 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0317783Z E1204 12:39:25.210000 339865 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0318235Z E1204 12:39:25.210000 339865 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0318670Z E1204 12:39:25.210000 339865 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.0319107Z E1204 12:39:25.210000 339865 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0319612Z E1204 12:39:25.210000 339865 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.0320322Z E1204 12:39:25.210000 339865 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 1268776960 and is now 3076521984.
2025-12-04T12:44:29.0320997Z E1204 12:39:25.210000 339865 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0321331Z E1204 12:39:25.210000 339865 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0321977Z E1204 12:39:25.210000 339865 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T12:44:29.0322542Z E1204 12:39:25.210000 339865 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0322895Z E1204 12:39:25.210000 339865 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0323292Z E1204 12:39:25.210000 339865 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T12:44:29.0323619Z E1204 12:39:25.231000 339864 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.0323944Z E1204 12:39:25.231000 339864 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.0324420Z E1204 12:39:25.231000 339864 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0324884Z E1204 12:39:25.231000 339864 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.0325381Z E1204 12:39:25.231000 339864 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0325813Z E1204 12:39:25.231000 339864 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.0326239Z E1204 12:39:25.231000 339864 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0326686Z E1204 12:39:25.231000 339864 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0327134Z E1204 12:39:25.231000 339864 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0327582Z E1204 12:39:25.231000 339864 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0328031Z E1204 12:39:25.231000 339864 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0328468Z E1204 12:39:25.231000 339864 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.0328926Z E1204 12:39:25.231000 339864 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0329376Z E1204 12:39:25.231000 339864 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.0330111Z E1204 12:39:25.231000 339864 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 1268776960 and is now 3076521984.
2025-12-04T12:44:29.0330769Z E1204 12:39:25.231000 339864 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0331104Z E1204 12:39:25.231000 339864 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0331752Z E1204 12:39:25.231000 339864 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T12:44:29.0332320Z E1204 12:39:25.231000 339864 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0332668Z E1204 12:39:25.231000 339864 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0333067Z E1204 12:39:25.231000 339864 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T12:44:29.0333305Z FAILED [10.1200s] [ 12%]
2025-12-04T12:44:29.0333377Z 
2025-12-04T12:44:29.0333436Z =================================== FAILURES ===================================
2025-12-04T12:44:29.0333684Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda _
2025-12-04T12:44:29.0333917Z Traceback (most recent call last):
2025-12-04T12:44:29.0334165Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T12:44:29.0334448Z     self._join_processes(fn)
2025-12-04T12:44:29.0334697Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T12:44:29.0334963Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T12:44:29.0335238Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T12:44:29.0335500Z     raise RuntimeError(error)
2025-12-04T12:44:29.0335655Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:44:29.0335818Z Traceback (most recent call last):
2025-12-04T12:44:29.0336059Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0336306Z     getattr(self, test_name)()
2025-12-04T12:44:29.0336544Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0336778Z     fn()
2025-12-04T12:44:29.0336983Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0337216Z     method(*args, **kwargs)
2025-12-04T12:44:29.0337441Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0337671Z     method(*args, **kwargs)
2025-12-04T12:44:29.0337908Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0338138Z     with policy():
2025-12-04T12:44:29.0338360Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0338595Z     raise RuntimeError(msg)
2025-12-04T12:44:29.0339088Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3242196992.
2025-12-04T12:44:29.0339522Z 
2025-12-04T12:44:29.0339625Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0340041Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T12:44:29.0340385Z 
2025-12-04T12:44:29.0340476Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0340604Z 
2025-12-04T12:44:29.0340605Z 
2025-12-04T12:44:29.0340688Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:44:29.0340894Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T12:44:29.0341297Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-cc2b69bb2c652278.xml -
2025-12-04T12:44:29.0341664Z =========================== short test summary info ============================
2025-12-04T12:44:29.0342083Z FAILED [10.1200s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:44:29.0342479Z Traceback (most recent call last):
2025-12-04T12:44:29.0342728Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0342975Z     getattr(self, test_name)()
2025-12-04T12:44:29.0343210Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0343484Z     fn()
2025-12-04T12:44:29.0343686Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0343922Z     method(*args, **kwargs)
2025-12-04T12:44:29.0344143Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0344376Z     method(*args, **kwargs)
2025-12-04T12:44:29.0344598Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0344828Z     with policy():
2025-12-04T12:44:29.0345042Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0345275Z     raise RuntimeError(msg)
2025-12-04T12:44:29.0345747Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3242196992.
2025-12-04T12:44:29.0346185Z 
2025-12-04T12:44:29.0346260Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0346679Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T12:44:29.0347041Z 
2025-12-04T12:44:29.0347129Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0347318Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:44:29.0347478Z ============================== 1 failed in 10.13s ==============================
2025-12-04T12:44:29.0347626Z Got exit code 1
2025-12-04T12:44:29.0347733Z Retrying single test...
2025-12-04T12:44:29.0348026Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-9152f00d6bb6fe13.xml
2025-12-04T12:44:29.0348347Z ============================= test session starts ==============================
2025-12-04T12:44:29.0348564Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:44:29.0348756Z cachedir: .pytest_cache
2025-12-04T12:44:29.0348983Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:44:29.0349225Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T12:44:29.0349345Z configfile: pytest.ini
2025-12-04T12:44:29.0349613Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T12:44:29.0349888Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T12:44:29.0350289Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T12:44:29.0350661Z Running 1 items in this shard
2025-12-04T12:44:29.0350737Z 
2025-12-04T12:44:29.0351111Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda I1204 12:39:29.322000 340264 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 340333
2025-12-04T12:44:29.0351678Z I1204 12:39:29.323000 340264 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 340334
2025-12-04T12:44:29.0352021Z I1204 12:39:29.323000 340264 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 340335
2025-12-04T12:44:29.0352403Z I1204 12:39:29.324000 340264 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 340336
2025-12-04T12:44:29.0353282Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.0354032Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.0354772Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.0355515Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.0356250Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.0357011Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.0357748Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.0358499Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.0359869Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:44:29.0361299Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:44:29.0362757Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:44:29.0364158Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:44:29.0365578Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:44:29.0366998Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:44:29.0368404Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:44:29.0369854Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:44:29.0370151Z E1204 12:39:37.959000 340334 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.0370479Z E1204 12:39:37.959000 340334 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.0370951Z E1204 12:39:37.959000 340334 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0371416Z E1204 12:39:37.959000 340334 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.0371886Z E1204 12:39:37.959000 340334 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0372347Z E1204 12:39:37.959000 340334 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.0372774Z E1204 12:39:37.959000 340334 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0373223Z E1204 12:39:37.959000 340334 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0373674Z E1204 12:39:37.959000 340334 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0374122Z E1204 12:39:37.959000 340334 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0374576Z E1204 12:39:37.959000 340334 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0375013Z E1204 12:39:37.959000 340334 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.0375451Z E1204 12:39:37.959000 340334 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0375899Z E1204 12:39:37.959000 340334 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.0376628Z E1204 12:39:37.959000 340334 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 1268776960 and is now 3076521984.
2025-12-04T12:44:29.0377301Z E1204 12:39:37.959000 340334 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0377638Z E1204 12:39:37.959000 340334 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0378288Z E1204 12:39:37.959000 340334 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T12:44:29.0378855Z E1204 12:39:37.959000 340334 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0379205Z E1204 12:39:37.959000 340334 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0379647Z E1204 12:39:37.959000 340334 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T12:44:29.0379976Z E1204 12:39:37.969000 340333 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.0380299Z E1204 12:39:37.969000 340333 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.0380769Z E1204 12:39:37.969000 340333 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0381233Z E1204 12:39:37.969000 340333 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.0381737Z E1204 12:39:37.969000 340333 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0382168Z E1204 12:39:37.969000 340333 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.0382589Z E1204 12:39:37.969000 340333 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0383038Z E1204 12:39:37.969000 340333 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0383487Z E1204 12:39:37.969000 340333 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0383938Z E1204 12:39:37.969000 340333 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0384395Z E1204 12:39:37.969000 340333 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0384829Z E1204 12:39:37.969000 340333 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.0385267Z E1204 12:39:37.969000 340333 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0385731Z E1204 12:39:37.969000 340333 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.0386433Z E1204 12:39:37.969000 340333 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3240099840.
2025-12-04T12:44:29.0387103Z E1204 12:39:37.969000 340333 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0387437Z E1204 12:39:37.969000 340333 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0388084Z E1204 12:39:37.969000 340333 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T12:44:29.0388651Z E1204 12:39:37.969000 340333 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0389004Z E1204 12:39:37.969000 340333 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0402635Z E1204 12:39:37.969000 340333 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T12:44:29.0402997Z E1204 12:39:37.973000 340335 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.0403333Z E1204 12:39:37.973000 340335 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.0403840Z E1204 12:39:37.973000 340335 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0404316Z E1204 12:39:37.973000 340335 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.0404865Z E1204 12:39:37.973000 340335 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0405308Z E1204 12:39:37.973000 340335 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.0405740Z E1204 12:39:37.973000 340335 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0406200Z E1204 12:39:37.973000 340335 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0406659Z E1204 12:39:37.973000 340335 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0407115Z E1204 12:39:37.973000 340335 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0407569Z E1204 12:39:37.973000 340335 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0408011Z E1204 12:39:37.973000 340335 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.0408482Z E1204 12:39:37.973000 340335 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0408939Z E1204 12:39:37.973000 340335 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.0409699Z E1204 12:39:37.973000 340335 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 1268776960 and is now 3076521984.
2025-12-04T12:44:29.0410373Z E1204 12:39:37.973000 340335 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0410718Z E1204 12:39:37.973000 340335 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0411385Z E1204 12:39:37.973000 340335 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T12:44:29.0411971Z E1204 12:39:37.973000 340335 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0412333Z E1204 12:39:37.973000 340335 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0412741Z E1204 12:39:37.973000 340335 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T12:44:29.0413077Z E1204 12:39:37.980000 340336 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.0413410Z E1204 12:39:37.980000 340336 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.0413895Z E1204 12:39:37.980000 340336 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0414401Z E1204 12:39:37.980000 340336 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.0414870Z E1204 12:39:37.980000 340336 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0415310Z E1204 12:39:37.980000 340336 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.0415740Z E1204 12:39:37.980000 340336 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0416194Z E1204 12:39:37.980000 340336 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0416665Z E1204 12:39:37.980000 340336 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0417122Z E1204 12:39:37.980000 340336 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0417576Z E1204 12:39:37.980000 340336 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0418042Z E1204 12:39:37.980000 340336 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.0418492Z E1204 12:39:37.980000 340336 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0418969Z E1204 12:39:37.980000 340336 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.0419725Z E1204 12:39:37.980000 340336 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 1256194048 and is now 3076521984.
2025-12-04T12:44:29.0420389Z E1204 12:39:37.980000 340336 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0420730Z E1204 12:39:37.980000 340336 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0421386Z E1204 12:39:37.980000 340336 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T12:44:29.0421953Z E1204 12:39:37.980000 340336 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0422309Z E1204 12:39:37.980000 340336 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0422714Z E1204 12:39:37.980000 340336 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T12:44:29.0422955Z FAILED [9.9192s] [100%]
2025-12-04T12:44:29.0423029Z 
2025-12-04T12:44:29.0423089Z =================================== FAILURES ===================================
2025-12-04T12:44:29.0423343Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda _
2025-12-04T12:44:29.0423618Z Traceback (most recent call last):
2025-12-04T12:44:29.0423877Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T12:44:29.0424134Z     self._join_processes(fn)
2025-12-04T12:44:29.0424389Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T12:44:29.0424661Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T12:44:29.0424937Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T12:44:29.0425204Z     raise RuntimeError(error)
2025-12-04T12:44:29.0425361Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:44:29.0425531Z Traceback (most recent call last):
2025-12-04T12:44:29.0425778Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0426034Z     getattr(self, test_name)()
2025-12-04T12:44:29.0426274Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0426512Z     fn()
2025-12-04T12:44:29.0426722Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0426960Z     method(*args, **kwargs)
2025-12-04T12:44:29.0427190Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0427448Z     method(*args, **kwargs)
2025-12-04T12:44:29.0427675Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0427911Z     with policy():
2025-12-04T12:44:29.0428148Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0428386Z     raise RuntimeError(msg)
2025-12-04T12:44:29.0428860Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3240099840.
2025-12-04T12:44:29.0429298Z 
2025-12-04T12:44:29.0429375Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0429838Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T12:44:29.0430186Z 
2025-12-04T12:44:29.0430282Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0430411Z 
2025-12-04T12:44:29.0430479Z Process 1 exited with error code 10 and exception:
2025-12-04T12:44:29.0430629Z Traceback (most recent call last):
2025-12-04T12:44:29.0430881Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0431131Z     getattr(self, test_name)()
2025-12-04T12:44:29.0431373Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0431612Z     fn()
2025-12-04T12:44:29.0431825Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0432064Z     method(*args, **kwargs)
2025-12-04T12:44:29.0432291Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0432530Z     method(*args, **kwargs)
2025-12-04T12:44:29.0432792Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0433028Z     with policy():
2025-12-04T12:44:29.0433253Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0433494Z     raise RuntimeError(msg)
2025-12-04T12:44:29.0433970Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 1268776960 and is now 3076521984.
2025-12-04T12:44:29.0434400Z 
2025-12-04T12:44:29.0434481Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0434906Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T12:44:29.0435243Z 
2025-12-04T12:44:29.0435339Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0435464Z 
2025-12-04T12:44:29.0435466Z 
2025-12-04T12:44:29.0435553Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:44:29.0435762Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T12:44:29.0436167Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-9152f00d6bb6fe13.xml -
2025-12-04T12:44:29.0436571Z =========================== short test summary info ============================
2025-12-04T12:44:29.0436990Z FAILED [9.9192s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:44:29.0437408Z Traceback (most recent call last):
2025-12-04T12:44:29.0437662Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0437914Z     getattr(self, test_name)()
2025-12-04T12:44:29.0438154Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0438394Z     fn()
2025-12-04T12:44:29.0438605Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0438846Z     method(*args, **kwargs)
2025-12-04T12:44:29.0439076Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0439319Z     method(*args, **kwargs)
2025-12-04T12:44:29.0439550Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0439825Z     with policy():
2025-12-04T12:44:29.0440049Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0440289Z     raise RuntimeError(msg)
2025-12-04T12:44:29.0440768Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3240099840.
2025-12-04T12:44:29.0441210Z 
2025-12-04T12:44:29.0441288Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0441711Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T12:44:29.0442087Z 
2025-12-04T12:44:29.0442180Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0442304Z 
2025-12-04T12:44:29.0442366Z Process 1 exited with error code 10 and exception:
2025-12-04T12:44:29.0442507Z Traceback (most recent call last):
2025-12-04T12:44:29.0442756Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0443003Z     getattr(self, test_name)()
2025-12-04T12:44:29.0443238Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0443472Z     fn()
2025-12-04T12:44:29.0443676Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0443907Z     method(*args, **kwargs)
2025-12-04T12:44:29.0444131Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0444361Z     method(*args, **kwargs)
2025-12-04T12:44:29.0444580Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0444806Z     with policy():
2025-12-04T12:44:29.0445019Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0445273Z     raise RuntimeError(msg)
2025-12-04T12:44:29.0445739Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 1268776960 and is now 3076521984.
2025-12-04T12:44:29.0446184Z 
2025-12-04T12:44:29.0446260Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0446673Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T12:44:29.0447012Z 
2025-12-04T12:44:29.0447099Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0447289Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:44:29.0447458Z ======================= 1 failed, 7 deselected in 9.93s ========================
2025-12-04T12:44:29.0447600Z Got exit code 1
2025-12-04T12:44:29.0447700Z Retrying single test...
2025-12-04T12:44:29.0447998Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-2d3dc27ffee4c9ac.xml
2025-12-04T12:44:29.0448324Z ============================= test session starts ==============================
2025-12-04T12:44:29.0448541Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:44:29.0448734Z cachedir: .pytest_cache
2025-12-04T12:44:29.0448960Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:44:29.0449202Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T12:44:29.0449326Z configfile: pytest.ini
2025-12-04T12:44:29.0449557Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T12:44:29.0449874Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T12:44:29.0450277Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T12:44:29.0450649Z Running 1 items in this shard
2025-12-04T12:44:29.0450758Z 
2025-12-04T12:44:29.0451139Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda I1204 12:39:41.740000 340734 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 340803
2025-12-04T12:44:29.0451712Z I1204 12:39:41.740000 340734 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 340804
2025-12-04T12:44:29.0452060Z I1204 12:39:41.741000 340734 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 340805
2025-12-04T12:44:29.0452405Z I1204 12:39:41.741000 340734 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 340806
2025-12-04T12:44:29.0453293Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.0454046Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.0454781Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.0455560Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.0456295Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.0457035Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.0457765Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.0458503Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.0459908Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:44:29.0461329Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:44:29.0462745Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:44:29.0464152Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:44:29.0465584Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:44:29.0467024Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:44:29.0468447Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:44:29.0469904Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:44:29.0470201Z E1204 12:39:50.479000 340806 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.0470531Z E1204 12:39:50.479000 340806 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.0471052Z E1204 12:39:50.479000 340806 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0471514Z E1204 12:39:50.479000 340806 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.0471979Z E1204 12:39:50.479000 340806 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0472410Z E1204 12:39:50.479000 340806 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.0472832Z E1204 12:39:50.479000 340806 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0473279Z E1204 12:39:50.479000 340806 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0473721Z E1204 12:39:50.479000 340806 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0474162Z E1204 12:39:50.479000 340806 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0474626Z E1204 12:39:50.479000 340806 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0475182Z E1204 12:39:50.479000 340806 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.0475624Z E1204 12:39:50.479000 340806 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0476085Z E1204 12:39:50.479000 340806 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.0476783Z E1204 12:39:50.479000 340806 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 1268776960 and is now 3076521984.
2025-12-04T12:44:29.0477489Z E1204 12:39:50.479000 340806 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0477822Z E1204 12:39:50.479000 340806 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0478470Z E1204 12:39:50.479000 340806 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T12:44:29.0479030Z E1204 12:39:50.479000 340806 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0479375Z E1204 12:39:50.479000 340806 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0479812Z E1204 12:39:50.479000 340806 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T12:44:29.0480137Z E1204 12:39:50.486000 340803 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.0480492Z E1204 12:39:50.486000 340803 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.0480959Z E1204 12:39:50.486000 340803 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0481416Z E1204 12:39:50.486000 340803 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.0481878Z E1204 12:39:50.486000 340803 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0482305Z E1204 12:39:50.486000 340803 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.0482732Z E1204 12:39:50.486000 340803 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0483177Z E1204 12:39:50.486000 340803 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0483618Z E1204 12:39:50.486000 340803 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0484079Z E1204 12:39:50.486000 340803 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0484522Z E1204 12:39:50.486000 340803 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0484975Z E1204 12:39:50.486000 340803 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.0485410Z E1204 12:39:50.486000 340803 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0485855Z E1204 12:39:50.486000 340803 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.0486546Z E1204 12:39:50.486000 340803 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3240099840.
2025-12-04T12:44:29.0487198Z E1204 12:39:50.486000 340803 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0487532Z E1204 12:39:50.486000 340803 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0488176Z E1204 12:39:50.486000 340803 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T12:44:29.0488733Z E1204 12:39:50.486000 340803 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0489076Z E1204 12:39:50.486000 340803 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0489469Z E1204 12:39:50.486000 340803 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T12:44:29.0489870Z E1204 12:39:50.521000 340804 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.0490187Z E1204 12:39:50.521000 340804 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.0490652Z E1204 12:39:50.521000 340804 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0491112Z E1204 12:39:50.521000 340804 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.0491570Z E1204 12:39:50.521000 340804 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0492005Z E1204 12:39:50.521000 340804 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.0492424Z E1204 12:39:50.521000 340804 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0492866Z E1204 12:39:50.521000 340804 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0493309Z E1204 12:39:50.521000 340804 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0493766Z E1204 12:39:50.521000 340804 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0494215Z E1204 12:39:50.521000 340804 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0494659Z E1204 12:39:50.521000 340804 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.0495095Z E1204 12:39:50.521000 340804 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0495540Z E1204 12:39:50.521000 340804 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.0496231Z E1204 12:39:50.521000 340804 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 1268776960 and is now 3076521984.
2025-12-04T12:44:29.0496889Z E1204 12:39:50.521000 340804 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0497220Z E1204 12:39:50.521000 340804 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0497864Z E1204 12:39:50.521000 340804 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T12:44:29.0498421Z E1204 12:39:50.521000 340804 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0498823Z E1204 12:39:50.521000 340804 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0499243Z E1204 12:39:50.521000 340804 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T12:44:29.0499567Z E1204 12:39:50.528000 340805 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.0499926Z E1204 12:39:50.528000 340805 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.0500394Z E1204 12:39:50.528000 340805 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0500855Z E1204 12:39:50.528000 340805 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.0501362Z E1204 12:39:50.528000 340805 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0501793Z E1204 12:39:50.528000 340805 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.0502253Z E1204 12:39:50.528000 340805 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0502700Z E1204 12:39:50.528000 340805 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0503161Z E1204 12:39:50.528000 340805 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0503636Z E1204 12:39:50.528000 340805 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0504096Z E1204 12:39:50.528000 340805 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0504562Z E1204 12:39:50.528000 340805 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.0504998Z E1204 12:39:50.528000 340805 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0505447Z E1204 12:39:50.528000 340805 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.0506144Z E1204 12:39:50.528000 340805 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 1268776960 and is now 3076521984.
2025-12-04T12:44:29.0506799Z E1204 12:39:50.528000 340805 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0507132Z E1204 12:39:50.528000 340805 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0507783Z E1204 12:39:50.528000 340805 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T12:44:29.0508345Z E1204 12:39:50.528000 340805 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0508741Z E1204 12:39:50.528000 340805 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0509132Z E1204 12:39:50.528000 340805 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T12:44:29.0509364Z FAILED [10.1199s] [100%]
2025-12-04T12:44:29.0509432Z 
2025-12-04T12:44:29.0509488Z =================================== FAILURES ===================================
2025-12-04T12:44:29.0509763Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda _
2025-12-04T12:44:29.0509992Z Traceback (most recent call last):
2025-12-04T12:44:29.0510233Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T12:44:29.0510473Z     self._join_processes(fn)
2025-12-04T12:44:29.0510721Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T12:44:29.0510984Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T12:44:29.0511251Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T12:44:29.0511507Z     raise RuntimeError(error)
2025-12-04T12:44:29.0511655Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T12:44:29.0511813Z Traceback (most recent call last):
2025-12-04T12:44:29.0512072Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0512313Z     getattr(self, test_name)()
2025-12-04T12:44:29.0512543Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0512786Z     fn()
2025-12-04T12:44:29.0512993Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0513219Z     method(*args, **kwargs)
2025-12-04T12:44:29.0513438Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0513668Z     method(*args, **kwargs)
2025-12-04T12:44:29.0513884Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0514110Z     with policy():
2025-12-04T12:44:29.0514319Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0514546Z     raise RuntimeError(msg)
2025-12-04T12:44:29.0515016Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 1268776960 and is now 3076521984.
2025-12-04T12:44:29.0515446Z 
2025-12-04T12:44:29.0515519Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0515928Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T12:44:29.0516267Z 
2025-12-04T12:44:29.0516357Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0516483Z 
2025-12-04T12:44:29.0516485Z 
2025-12-04T12:44:29.0516563Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:44:29.0516762Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T12:44:29.0517191Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-2d3dc27ffee4c9ac.xml -
2025-12-04T12:44:29.0517554Z =========================== short test summary info ============================
2025-12-04T12:44:29.0517962Z FAILED [10.1199s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T12:44:29.0518350Z Traceback (most recent call last):
2025-12-04T12:44:29.0518593Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0518834Z     getattr(self, test_name)()
2025-12-04T12:44:29.0519068Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0519297Z     fn()
2025-12-04T12:44:29.0519499Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0519767Z     method(*args, **kwargs)
2025-12-04T12:44:29.0519985Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0520210Z     method(*args, **kwargs)
2025-12-04T12:44:29.0520425Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0520668Z     with policy():
2025-12-04T12:44:29.0520877Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0521106Z     raise RuntimeError(msg)
2025-12-04T12:44:29.0521572Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 1268776960 and is now 3076521984.
2025-12-04T12:44:29.0522018Z 
2025-12-04T12:44:29.0522091Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0522508Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T12:44:29.0522847Z 
2025-12-04T12:44:29.0522935Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0523120Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:44:29.0523281Z ======================= 1 failed, 7 deselected in 10.13s =======================
2025-12-04T12:44:29.0523416Z Got exit code 1
2025-12-04T12:44:29.0523724Z FAILED CONSISTENTLY: test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T12:44:29.0524131Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:44:29.0524515Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-7802117e87e0d3bb.xml
2025-12-04T12:44:29.0524832Z ============================= test session starts ==============================
2025-12-04T12:44:29.0525047Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:44:29.0525233Z cachedir: .pytest_cache
2025-12-04T12:44:29.0525457Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:44:29.0525693Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T12:44:29.0525810Z configfile: pytest.ini
2025-12-04T12:44:29.0526071Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T12:44:29.0526339Z collecting ... collected 8 items / 1 deselected / 7 selected
2025-12-04T12:44:29.0526495Z stepcurrent: skipping 1 already run items.
2025-12-04T12:44:29.0526622Z Running 7 items in this shard
2025-12-04T12:44:29.0526692Z 
2025-12-04T12:44:29.0527066Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda I1204 12:39:54.365000 341204 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 341273
2025-12-04T12:44:29.0527624Z I1204 12:39:54.366000 341204 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 341274
2025-12-04T12:44:29.0527960Z I1204 12:39:54.366000 341204 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 341275
2025-12-04T12:44:29.0528299Z I1204 12:39:54.367000 341204 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 341276
2025-12-04T12:44:29.0529168Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.0529956Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.0530696Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.0531450Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.0532184Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.0532920Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.0533658Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.0534403Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.0535789Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:44:29.0537198Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:44:29.0538611Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:44:29.0540044Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:44:29.0541457Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:44:29.0542881Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:44:29.0544294Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:44:29.0545703Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:44:29.0546056Z E1204 12:40:02.980000 341273 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.0546389Z E1204 12:40:02.980000 341273 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.0546875Z E1204 12:40:02.980000 341273 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0547342Z E1204 12:40:02.980000 341273 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.0547808Z E1204 12:40:02.980000 341273 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0548244Z E1204 12:40:02.980000 341273 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.0548675Z E1204 12:40:02.980000 341273 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0549129Z E1204 12:40:02.980000 341273 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0549620Z E1204 12:40:02.980000 341273 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0550091Z E1204 12:40:02.980000 341273 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0550541Z E1204 12:40:02.980000 341273 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0550992Z E1204 12:40:02.980000 341273 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.0551587Z E1204 12:40:02.980000 341273 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0552034Z E1204 12:40:02.980000 341273 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.0552733Z E1204 12:40:02.980000 341273 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3219128320.
2025-12-04T12:44:29.0553394Z E1204 12:40:02.980000 341273 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0553726Z E1204 12:40:02.980000 341273 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0554369Z E1204 12:40:02.980000 341273 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T12:44:29.0554929Z E1204 12:40:02.980000 341273 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0555273Z E1204 12:40:02.980000 341273 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0555712Z E1204 12:40:02.980000 341273 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T12:44:29.0556036Z E1204 12:40:03.001000 341275 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.0556357Z E1204 12:40:03.001000 341275 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.0556827Z E1204 12:40:03.001000 341275 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0557286Z E1204 12:40:03.001000 341275 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.0557744Z E1204 12:40:03.001000 341275 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0558172Z E1204 12:40:03.001000 341275 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.0558595Z E1204 12:40:03.001000 341275 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0559041Z E1204 12:40:03.001000 341275 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0559503Z E1204 12:40:03.001000 341275 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0560013Z E1204 12:40:03.001000 341275 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0560476Z E1204 12:40:03.001000 341275 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0560906Z E1204 12:40:03.001000 341275 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.0561340Z E1204 12:40:03.001000 341275 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0561792Z E1204 12:40:03.001000 341275 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.0562482Z E1204 12:40:03.001000 341275 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 1268776960 and is now 3053453312.
2025-12-04T12:44:29.0563139Z E1204 12:40:03.001000 341275 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0563467Z E1204 12:40:03.001000 341275 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0564111Z E1204 12:40:03.001000 341275 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T12:44:29.0564675Z E1204 12:40:03.001000 341275 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0565053Z E1204 12:40:03.001000 341275 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0565457Z E1204 12:40:03.001000 341275 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T12:44:29.0565787Z E1204 12:40:03.029000 341276 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.0566113Z E1204 12:40:03.029000 341276 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.0566596Z E1204 12:40:03.029000 341276 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0567062Z E1204 12:40:03.029000 341276 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.0567530Z E1204 12:40:03.029000 341276 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0567966Z E1204 12:40:03.029000 341276 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.0568395Z E1204 12:40:03.029000 341276 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0568862Z E1204 12:40:03.029000 341276 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0569314Z E1204 12:40:03.029000 341276 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0569990Z E1204 12:40:03.029000 341276 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0570448Z E1204 12:40:03.029000 341276 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0570892Z E1204 12:40:03.029000 341276 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.0571340Z E1204 12:40:03.029000 341276 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0571794Z E1204 12:40:03.029000 341276 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.0572497Z E1204 12:40:03.029000 341276 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 958398464 and is now 3053453312.
2025-12-04T12:44:29.0573155Z E1204 12:40:03.029000 341276 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0573493Z E1204 12:40:03.029000 341276 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0574231Z E1204 12:40:03.029000 341276 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T12:44:29.0574834Z E1204 12:40:03.029000 341276 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0575180Z E1204 12:40:03.029000 341276 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0575577Z E1204 12:40:03.029000 341276 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T12:44:29.0575908Z E1204 12:40:03.097000 341274 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.0576236Z E1204 12:40:03.097000 341274 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.0576709Z E1204 12:40:03.097000 341274 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0577174Z E1204 12:40:03.097000 341274 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.0577634Z E1204 12:40:03.097000 341274 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0578069Z E1204 12:40:03.097000 341274 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.0578519Z E1204 12:40:03.097000 341274 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0578971Z E1204 12:40:03.097000 341274 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0579446Z E1204 12:40:03.097000 341274 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0579938Z E1204 12:40:03.097000 341274 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0580396Z E1204 12:40:03.097000 341274 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0580840Z E1204 12:40:03.097000 341274 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.0581280Z E1204 12:40:03.097000 341274 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0581733Z E1204 12:40:03.097000 341274 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.0582426Z E1204 12:40:03.097000 341274 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 1268776960 and is now 3053453312.
2025-12-04T12:44:29.0583086Z E1204 12:40:03.097000 341274 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0583424Z E1204 12:40:03.097000 341274 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0584102Z E1204 12:40:03.097000 341274 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T12:44:29.0584667Z E1204 12:40:03.097000 341274 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0585015Z E1204 12:40:03.097000 341274 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0585410Z E1204 12:40:03.097000 341274 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T12:44:29.0585652Z FAILED [10.0206s] [ 14%]
2025-12-04T12:44:29.0585725Z 
2025-12-04T12:44:29.0585784Z =================================== FAILURES ===================================
2025-12-04T12:44:29.0586031Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda _
2025-12-04T12:44:29.0586271Z Traceback (most recent call last):
2025-12-04T12:44:29.0586523Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T12:44:29.0586771Z     self._join_processes(fn)
2025-12-04T12:44:29.0587025Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T12:44:29.0587295Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T12:44:29.0587566Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T12:44:29.0587848Z     raise RuntimeError(error)
2025-12-04T12:44:29.0588000Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:44:29.0588164Z Traceback (most recent call last):
2025-12-04T12:44:29.0588412Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0588678Z     getattr(self, test_name)()
2025-12-04T12:44:29.0588918Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0589155Z     fn()
2025-12-04T12:44:29.0589367Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0589636Z     method(*args, **kwargs)
2025-12-04T12:44:29.0589861Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0590100Z     method(*args, **kwargs)
2025-12-04T12:44:29.0590325Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0590557Z     with policy():
2025-12-04T12:44:29.0590776Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0591020Z     raise RuntimeError(msg)
2025-12-04T12:44:29.0591491Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3219128320.
2025-12-04T12:44:29.0591923Z 
2025-12-04T12:44:29.0591999Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0592409Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T12:44:29.0592748Z 
2025-12-04T12:44:29.0592835Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0592958Z 
2025-12-04T12:44:29.0593024Z Process 2 exited with error code 10 and exception:
2025-12-04T12:44:29.0593204Z Traceback (most recent call last):
2025-12-04T12:44:29.0593454Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0593697Z     getattr(self, test_name)()
2025-12-04T12:44:29.0593935Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0594167Z     fn()
2025-12-04T12:44:29.0594370Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0594598Z     method(*args, **kwargs)
2025-12-04T12:44:29.0594815Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0595042Z     method(*args, **kwargs)
2025-12-04T12:44:29.0595266Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0595491Z     with policy():
2025-12-04T12:44:29.0595702Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0595933Z     raise RuntimeError(msg)
2025-12-04T12:44:29.0596396Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 1268776960 and is now 3053453312.
2025-12-04T12:44:29.0596841Z 
2025-12-04T12:44:29.0596918Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0597329Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T12:44:29.0597685Z 
2025-12-04T12:44:29.0597772Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0597898Z 
2025-12-04T12:44:29.0597900Z 
2025-12-04T12:44:29.0597976Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:44:29.0598174Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T12:44:29.0598565Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-7802117e87e0d3bb.xml -
2025-12-04T12:44:29.0598930Z =========================== short test summary info ============================
2025-12-04T12:44:29.0599339Z FAILED [10.0206s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:44:29.0599770Z Traceback (most recent call last):
2025-12-04T12:44:29.0600015Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0600259Z     getattr(self, test_name)()
2025-12-04T12:44:29.0600492Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0600747Z     fn()
2025-12-04T12:44:29.0600948Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0601176Z     method(*args, **kwargs)
2025-12-04T12:44:29.0601395Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0601620Z     method(*args, **kwargs)
2025-12-04T12:44:29.0601842Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0602101Z     with policy():
2025-12-04T12:44:29.0602310Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0602540Z     raise RuntimeError(msg)
2025-12-04T12:44:29.0603004Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3219128320.
2025-12-04T12:44:29.0603440Z 
2025-12-04T12:44:29.0603515Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0603928Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T12:44:29.0604266Z 
2025-12-04T12:44:29.0604359Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0604482Z 
2025-12-04T12:44:29.0604541Z Process 2 exited with error code 10 and exception:
2025-12-04T12:44:29.0604681Z Traceback (most recent call last):
2025-12-04T12:44:29.0604925Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0605168Z     getattr(self, test_name)()
2025-12-04T12:44:29.0605416Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0605647Z     fn()
2025-12-04T12:44:29.0605846Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0606075Z     method(*args, **kwargs)
2025-12-04T12:44:29.0606313Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0606542Z     method(*args, **kwargs)
2025-12-04T12:44:29.0606758Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0606982Z     with policy():
2025-12-04T12:44:29.0607195Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0607424Z     raise RuntimeError(msg)
2025-12-04T12:44:29.0607892Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 1268776960 and is now 3053453312.
2025-12-04T12:44:29.0608318Z 
2025-12-04T12:44:29.0608399Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0608814Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T12:44:29.0609153Z 
2025-12-04T12:44:29.0609240Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0609431Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:44:29.0609637Z ======================= 1 failed, 1 deselected in 10.03s =======================
2025-12-04T12:44:29.0609779Z Got exit code 1
2025-12-04T12:44:29.0609877Z Retrying single test...
2025-12-04T12:44:29.0610168Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-483b30c5bd7b37b3.xml
2025-12-04T12:44:29.0610487Z ============================= test session starts ==============================
2025-12-04T12:44:29.0610744Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:44:29.0610937Z cachedir: .pytest_cache
2025-12-04T12:44:29.0611160Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:44:29.0611399Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T12:44:29.0611521Z configfile: pytest.ini
2025-12-04T12:44:29.0611751Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T12:44:29.0612026Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T12:44:29.0612429Z stepcurrent: skipping 1 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T12:44:29.0612799Z Running 1 items in this shard
2025-12-04T12:44:29.0612874Z 
2025-12-04T12:44:29.0613254Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda I1204 12:40:06.996000 341674 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 341743
2025-12-04T12:44:29.0613817Z I1204 12:40:06.997000 341674 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 341744
2025-12-04T12:44:29.0614159Z I1204 12:40:06.997000 341674 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 341745
2025-12-04T12:44:29.0614518Z I1204 12:40:06.998000 341674 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 341746
2025-12-04T12:44:29.0615390Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.0616159Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.0616899Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.0617636Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.0618369Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.0619106Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.0619881Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.0620657Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.0621994Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:44:29.0623398Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:44:29.0624815Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:44:29.0626248Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:44:29.0627683Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:44:29.0629107Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:44:29.0630620Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:44:29.0632052Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:44:29.0632346Z E1204 12:40:15.680000 341743 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.0632673Z E1204 12:40:15.680000 341743 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.0633151Z E1204 12:40:15.680000 341743 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0633623Z E1204 12:40:15.680000 341743 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.0634094Z E1204 12:40:15.680000 341743 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0634545Z E1204 12:40:15.680000 341743 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.0634971Z E1204 12:40:15.680000 341743 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0635435Z E1204 12:40:15.680000 341743 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0635884Z E1204 12:40:15.680000 341743 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0636334Z E1204 12:40:15.680000 341743 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0636785Z E1204 12:40:15.680000 341743 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0637224Z E1204 12:40:15.680000 341743 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.0637666Z E1204 12:40:15.680000 341743 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0638116Z E1204 12:40:15.680000 341743 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.0638811Z E1204 12:40:15.680000 341743 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3219128320.
2025-12-04T12:44:29.0639471Z E1204 12:40:15.680000 341743 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0639854Z E1204 12:40:15.680000 341743 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0640542Z E1204 12:40:15.680000 341743 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T12:44:29.0641113Z E1204 12:40:15.680000 341743 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0641467Z E1204 12:40:15.680000 341743 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0641865Z E1204 12:40:15.680000 341743 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T12:44:29.0642193Z E1204 12:40:15.702000 341745 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.0642521Z E1204 12:40:15.702000 341745 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.0642996Z E1204 12:40:15.702000 341745 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0643461Z E1204 12:40:15.702000 341745 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.0643922Z E1204 12:40:15.702000 341745 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0644369Z E1204 12:40:15.702000 341745 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.0644794Z E1204 12:40:15.702000 341745 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0645257Z E1204 12:40:15.702000 341745 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0645709Z E1204 12:40:15.702000 341745 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0646154Z E1204 12:40:15.702000 341745 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0646602Z E1204 12:40:15.702000 341745 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0647039Z E1204 12:40:15.702000 341745 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.0647484Z E1204 12:40:15.702000 341745 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0647934Z E1204 12:40:15.702000 341745 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.0648626Z E1204 12:40:15.702000 341745 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 1268776960 and is now 3053453312.
2025-12-04T12:44:29.0649279Z E1204 12:40:15.702000 341745 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0649686Z E1204 12:40:15.702000 341745 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0650336Z E1204 12:40:15.702000 341745 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T12:44:29.0650900Z E1204 12:40:15.702000 341745 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0651251Z E1204 12:40:15.702000 341745 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0651646Z E1204 12:40:15.702000 341745 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T12:44:29.0651978Z E1204 12:40:15.725000 341746 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.0652301Z E1204 12:40:15.725000 341746 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.0652775Z E1204 12:40:15.725000 341746 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0653238Z E1204 12:40:15.725000 341746 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.0653717Z E1204 12:40:15.725000 341746 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0654166Z E1204 12:40:15.725000 341746 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.0654593Z E1204 12:40:15.725000 341746 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0655040Z E1204 12:40:15.725000 341746 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0655491Z E1204 12:40:15.725000 341746 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0655938Z E1204 12:40:15.725000 341746 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0656385Z E1204 12:40:15.725000 341746 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0656826Z E1204 12:40:15.725000 341746 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.0657262Z E1204 12:40:15.725000 341746 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0657715Z E1204 12:40:15.725000 341746 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.0658410Z E1204 12:40:15.725000 341746 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 958398464 and is now 3053453312.
2025-12-04T12:44:29.0659086Z E1204 12:40:15.725000 341746 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0659421Z E1204 12:40:15.725000 341746 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0660100Z E1204 12:40:15.725000 341746 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T12:44:29.0660664Z E1204 12:40:15.725000 341746 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0661015Z E1204 12:40:15.725000 341746 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0661415Z E1204 12:40:15.725000 341746 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T12:44:29.0661742Z E1204 12:40:15.750000 341744 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.0662063Z E1204 12:40:15.750000 341744 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.0662533Z E1204 12:40:15.750000 341744 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0663012Z E1204 12:40:15.750000 341744 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.0663474Z E1204 12:40:15.750000 341744 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0663919Z E1204 12:40:15.750000 341744 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.0664342Z E1204 12:40:15.750000 341744 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0664790Z E1204 12:40:15.750000 341744 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0665239Z E1204 12:40:15.750000 341744 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0665683Z E1204 12:40:15.750000 341744 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0666135Z E1204 12:40:15.750000 341744 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0666571Z E1204 12:40:15.750000 341744 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.0667007Z E1204 12:40:15.750000 341744 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0667460Z E1204 12:40:15.750000 341744 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.0668180Z E1204 12:40:15.750000 341744 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 1268776960 and is now 3053453312.
2025-12-04T12:44:29.0668836Z E1204 12:40:15.750000 341744 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0669170Z E1204 12:40:15.750000 341744 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0669851Z E1204 12:40:15.750000 341744 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T12:44:29.0670414Z E1204 12:40:15.750000 341744 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0670764Z E1204 12:40:15.750000 341744 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0671167Z E1204 12:40:15.750000 341744 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T12:44:29.0671404Z FAILED [10.0209s] [100%]
2025-12-04T12:44:29.0671475Z 
2025-12-04T12:44:29.0671535Z =================================== FAILURES ===================================
2025-12-04T12:44:29.0671781Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda _
2025-12-04T12:44:29.0672032Z Traceback (most recent call last):
2025-12-04T12:44:29.0672281Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T12:44:29.0672529Z     self._join_processes(fn)
2025-12-04T12:44:29.0672796Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T12:44:29.0673063Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T12:44:29.0673334Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T12:44:29.0673595Z     raise RuntimeError(error)
2025-12-04T12:44:29.0673748Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:44:29.0673911Z Traceback (most recent call last):
2025-12-04T12:44:29.0674157Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0674401Z     getattr(self, test_name)()
2025-12-04T12:44:29.0674636Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0674871Z     fn()
2025-12-04T12:44:29.0675079Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0675311Z     method(*args, **kwargs)
2025-12-04T12:44:29.0675532Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0675763Z     method(*args, **kwargs)
2025-12-04T12:44:29.0675984Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0676213Z     with policy():
2025-12-04T12:44:29.0676433Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0676672Z     raise RuntimeError(msg)
2025-12-04T12:44:29.0677172Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3219128320.
2025-12-04T12:44:29.0677604Z 
2025-12-04T12:44:29.0677680Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0678091Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T12:44:29.0678433Z 
2025-12-04T12:44:29.0678524Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0678650Z 
2025-12-04T12:44:29.0678651Z 
2025-12-04T12:44:29.0678733Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:44:29.0678935Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T12:44:29.0679336Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-483b30c5bd7b37b3.xml -
2025-12-04T12:44:29.0679746Z =========================== short test summary info ============================
2025-12-04T12:44:29.0680157Z FAILED [10.0209s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:44:29.0680547Z Traceback (most recent call last):
2025-12-04T12:44:29.0680794Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0681057Z     getattr(self, test_name)()
2025-12-04T12:44:29.0681293Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0681529Z     fn()
2025-12-04T12:44:29.0681758Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0681990Z     method(*args, **kwargs)
2025-12-04T12:44:29.0682213Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0682443Z     method(*args, **kwargs)
2025-12-04T12:44:29.0682662Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0687545Z     with policy():
2025-12-04T12:44:29.0687782Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0688017Z     raise RuntimeError(msg)
2025-12-04T12:44:29.0688488Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3219128320.
2025-12-04T12:44:29.0688919Z 
2025-12-04T12:44:29.0688997Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0689411Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T12:44:29.0689793Z 
2025-12-04T12:44:29.0689882Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0690073Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:44:29.0690238Z ======================= 1 failed, 7 deselected in 10.03s =======================
2025-12-04T12:44:29.0690375Z Got exit code 1
2025-12-04T12:44:29.0690474Z Retrying single test...
2025-12-04T12:44:29.0690816Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-ae4bcd06d210b4c8.xml
2025-12-04T12:44:29.0691140Z ============================= test session starts ==============================
2025-12-04T12:44:29.0691355Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:44:29.0691548Z cachedir: .pytest_cache
2025-12-04T12:44:29.0691770Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:44:29.0692013Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T12:44:29.0692130Z configfile: pytest.ini
2025-12-04T12:44:29.0692358Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T12:44:29.0692627Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T12:44:29.0693029Z stepcurrent: skipping 1 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T12:44:29.0693398Z Running 1 items in this shard
2025-12-04T12:44:29.0693471Z 
2025-12-04T12:44:29.0693849Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda I1204 12:40:19.921000 342144 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 342213
2025-12-04T12:44:29.0694429Z I1204 12:40:19.922000 342144 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 342214
2025-12-04T12:44:29.0694766Z I1204 12:40:19.922000 342144 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 342215
2025-12-04T12:44:29.0695105Z I1204 12:40:19.923000 342144 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 342216
2025-12-04T12:44:29.0695999Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.0696745Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.0697485Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.0698228Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.0698963Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.0699735Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.0700497Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.0701233Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.0702589Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:44:29.0704006Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:44:29.0705425Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:44:29.0706861Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:44:29.0708270Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:44:29.0709721Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:44:29.0711162Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T12:44:29.0712577Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:44:29.0712866Z E1204 12:40:28.497000 342213 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.0713196Z E1204 12:40:28.497000 342213 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.0713666Z E1204 12:40:28.497000 342213 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0714128Z E1204 12:40:28.497000 342213 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.0714593Z E1204 12:40:28.497000 342213 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0715038Z E1204 12:40:28.497000 342213 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.0715463Z E1204 12:40:28.497000 342213 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0715920Z E1204 12:40:28.497000 342213 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0716370Z E1204 12:40:28.497000 342213 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0716817Z E1204 12:40:28.497000 342213 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0717265Z E1204 12:40:28.497000 342213 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0717708Z E1204 12:40:28.497000 342213 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.0718151Z E1204 12:40:28.497000 342213 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0718599Z E1204 12:40:28.497000 342213 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.0719300Z E1204 12:40:28.497000 342213 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3376414720.
2025-12-04T12:44:29.0720002Z E1204 12:40:28.497000 342213 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0720366Z E1204 12:40:28.497000 342213 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0721017Z E1204 12:40:28.497000 342213 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T12:44:29.0721583Z E1204 12:40:28.497000 342213 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0721937Z E1204 12:40:28.497000 342213 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0722336Z E1204 12:40:28.497000 342213 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T12:44:29.0722666Z E1204 12:40:28.555000 342216 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.0722984Z E1204 12:40:28.555000 342216 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.0723454Z E1204 12:40:28.555000 342216 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0724216Z E1204 12:40:28.555000 342216 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.0724678Z E1204 12:40:28.555000 342216 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0725121Z E1204 12:40:28.555000 342216 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.0725543Z E1204 12:40:28.555000 342216 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0725989Z E1204 12:40:28.555000 342216 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0726436Z E1204 12:40:28.555000 342216 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0726887Z E1204 12:40:28.555000 342216 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0727335Z E1204 12:40:28.555000 342216 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0727769Z E1204 12:40:28.555000 342216 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.0728210Z E1204 12:40:28.555000 342216 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0728660Z E1204 12:40:28.555000 342216 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.0729359Z E1204 12:40:28.555000 342216 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 1107296256 and is now 3422552064.
2025-12-04T12:44:29.0730077Z E1204 12:40:28.555000 342216 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0730408Z E1204 12:40:28.555000 342216 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0731054Z E1204 12:40:28.555000 342216 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T12:44:29.0731615Z E1204 12:40:28.555000 342216 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0731963Z E1204 12:40:28.555000 342216 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0732366Z E1204 12:40:28.555000 342216 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T12:44:29.0732688Z E1204 12:40:28.585000 342215 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.0733010Z E1204 12:40:28.585000 342215 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.0733481Z E1204 12:40:28.585000 342215 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0733959Z E1204 12:40:28.585000 342215 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.0734422Z E1204 12:40:28.585000 342215 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0734866Z E1204 12:40:28.585000 342215 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.0735289Z E1204 12:40:28.585000 342215 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0735743Z E1204 12:40:28.585000 342215 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0736196Z E1204 12:40:28.585000 342215 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0736644Z E1204 12:40:28.585000 342215 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0737098Z E1204 12:40:28.585000 342215 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0737539Z E1204 12:40:28.585000 342215 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.0737978Z E1204 12:40:28.585000 342215 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0738427Z E1204 12:40:28.585000 342215 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.0739139Z E1204 12:40:28.585000 342215 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 1262485504 and is now 3219128320.
2025-12-04T12:44:29.0739835Z E1204 12:40:28.585000 342215 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0740170Z E1204 12:40:28.585000 342215 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0740813Z E1204 12:40:28.585000 342215 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T12:44:29.0741375Z E1204 12:40:28.585000 342215 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0741721Z E1204 12:40:28.585000 342215 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0742117Z E1204 12:40:28.585000 342215 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T12:44:29.0742441Z E1204 12:40:29.018000 342214 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.0742763Z E1204 12:40:29.018000 342214 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.0743246Z E1204 12:40:29.018000 342214 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0743712Z E1204 12:40:29.018000 342214 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.0744188Z E1204 12:40:29.018000 342214 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0744617Z E1204 12:40:29.018000 342214 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.0745034Z E1204 12:40:29.018000 342214 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0745483Z E1204 12:40:29.018000 342214 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0745930Z E1204 12:40:29.018000 342214 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0746378Z E1204 12:40:29.018000 342214 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0746824Z E1204 12:40:29.018000 342214 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0747253Z E1204 12:40:29.018000 342214 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.0747529Z E1204 12:40:29.018000 342214 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0747669Z E1204 12:40:29.018000 342214 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.0748321Z E1204 12:40:29.018000 342214 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 1268776960 and is now 3053453312.
2025-12-04T12:44:29.0748431Z E1204 12:40:29.018000 342214 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0748619Z E1204 12:40:29.018000 342214 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0749045Z E1204 12:40:29.018000 342214 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T12:44:29.0749157Z E1204 12:40:29.018000 342214 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0749366Z E1204 12:40:29.018000 342214 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0749528Z E1204 12:40:29.018000 342214 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T12:44:29.0749611Z FAILED [9.9199s] [100%]
2025-12-04T12:44:29.0749630Z 
2025-12-04T12:44:29.0749691Z =================================== FAILURES ===================================
2025-12-04T12:44:29.0749838Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda _
2025-12-04T12:44:29.0749889Z Traceback (most recent call last):
2025-12-04T12:44:29.0750052Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T12:44:29.0750123Z     self._join_processes(fn)
2025-12-04T12:44:29.0750298Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T12:44:29.0750357Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T12:44:29.0750537Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T12:44:29.0750583Z     raise RuntimeError(error)
2025-12-04T12:44:29.0750667Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:44:29.0750715Z Traceback (most recent call last):
2025-12-04T12:44:29.0750876Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0750923Z     getattr(self, test_name)()
2025-12-04T12:44:29.0751089Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0751126Z     fn()
2025-12-04T12:44:29.0751280Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0751324Z     method(*args, **kwargs)
2025-12-04T12:44:29.0751476Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0751520Z     method(*args, **kwargs)
2025-12-04T12:44:29.0751674Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0751715Z     with policy():
2025-12-04T12:44:29.0751869Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0751912Z     raise RuntimeError(msg)
2025-12-04T12:44:29.0752354Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3376414720.
2025-12-04T12:44:29.0752357Z 
2025-12-04T12:44:29.0752435Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0752740Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T12:44:29.0752743Z 
2025-12-04T12:44:29.0752830Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0752833Z 
2025-12-04T12:44:29.0752834Z 
2025-12-04T12:44:29.0752914Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:44:29.0753003Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T12:44:29.0753281Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-ae4bcd06d210b4c8.xml -
2025-12-04T12:44:29.0753345Z =========================== short test summary info ============================
2025-12-04T12:44:29.0753660Z FAILED [9.9199s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:44:29.0753722Z Traceback (most recent call last):
2025-12-04T12:44:29.0753960Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0754005Z     getattr(self, test_name)()
2025-12-04T12:44:29.0754180Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0754224Z     fn()
2025-12-04T12:44:29.0754377Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0754419Z     method(*args, **kwargs)
2025-12-04T12:44:29.0754571Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0754613Z     method(*args, **kwargs)
2025-12-04T12:44:29.0754762Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0754802Z     with policy():
2025-12-04T12:44:29.0754956Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0754999Z     raise RuntimeError(msg)
2025-12-04T12:44:29.0755398Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3376414720.
2025-12-04T12:44:29.0755403Z 
2025-12-04T12:44:29.0755479Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0755782Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T12:44:29.0755785Z 
2025-12-04T12:44:29.0755871Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0755937Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:44:29.0755998Z ======================= 1 failed, 7 deselected in 9.93s ========================
2025-12-04T12:44:29.0756039Z Got exit code 1
2025-12-04T12:44:29.0756314Z FAILED CONSISTENTLY: test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T12:44:29.0756448Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:44:29.0756677Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-b149cca1e6d1f159.xml
2025-12-04T12:44:29.0756740Z ============================= test session starts ==============================
2025-12-04T12:44:29.0756855Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:44:29.0756899Z cachedir: .pytest_cache
2025-12-04T12:44:29.0757061Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:44:29.0757108Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T12:44:29.0757156Z configfile: pytest.ini
2025-12-04T12:44:29.0757318Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T12:44:29.0757391Z collecting ... collected 8 items / 2 deselected / 6 selected
2025-12-04T12:44:29.0757445Z stepcurrent: skipping 2 already run items.
2025-12-04T12:44:29.0757491Z Running 6 items in this shard
2025-12-04T12:44:29.0757493Z 
2025-12-04T12:44:29.0757868Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda I1204 12:40:32.604000 342614 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 342683
2025-12-04T12:44:29.0758038Z I1204 12:40:32.605000 342614 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 342684
2025-12-04T12:44:29.0758201Z I1204 12:40:32.605000 342614 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 342685
2025-12-04T12:44:29.0758354Z I1204 12:40:32.606000 342614 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 342686
2025-12-04T12:44:29.0759042Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.0759086Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.0759789Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.0759833Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.0760497Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.0760545Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.0761234Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.0761278Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.0761776Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.0761829Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.0762319Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.0762367Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.0762871Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.0762930Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.0763416Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.0763465Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.0763601Z E1204 12:40:41.384000 342683 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.0763759Z E1204 12:40:41.384000 342683 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.0764046Z E1204 12:40:41.384000 342683 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0764198Z E1204 12:40:41.384000 342683 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.0764477Z E1204 12:40:41.384000 342683 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0764595Z E1204 12:40:41.384000 342683 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.0764867Z E1204 12:40:41.384000 342683 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0765009Z E1204 12:40:41.384000 342683 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0765299Z E1204 12:40:41.384000 342683 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0765442Z E1204 12:40:41.384000 342683 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0765714Z E1204 12:40:41.384000 342683 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0765844Z E1204 12:40:41.384000 342683 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.0766118Z E1204 12:40:41.384000 342683 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0766261Z E1204 12:40:41.384000 342683 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.0766782Z E1204 12:40:41.384000 342683 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3472883712.
2025-12-04T12:44:29.0766901Z E1204 12:40:41.384000 342683 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0767092Z E1204 12:40:41.384000 342683 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0767515Z E1204 12:40:41.384000 342683 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T12:44:29.0767633Z E1204 12:40:41.384000 342683 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0767836Z E1204 12:40:41.384000 342683 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0767995Z E1204 12:40:41.384000 342683 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T12:44:29.0768129Z E1204 12:40:41.395000 342686 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.0768280Z E1204 12:40:41.395000 342686 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.0768562Z E1204 12:40:41.395000 342686 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0768711Z E1204 12:40:41.395000 342686 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.0768988Z E1204 12:40:41.395000 342686 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0769107Z E1204 12:40:41.395000 342686 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.0769376Z E1204 12:40:41.395000 342686 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0769542Z E1204 12:40:41.395000 342686 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0769846Z E1204 12:40:41.395000 342686 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0769986Z E1204 12:40:41.395000 342686 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0770258Z E1204 12:40:41.395000 342686 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0770386Z E1204 12:40:41.395000 342686 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.0770659Z E1204 12:40:41.395000 342686 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0770801Z E1204 12:40:41.395000 342686 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.0771330Z E1204 12:40:41.395000 342686 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 1262485504 and is now 3307208704.
2025-12-04T12:44:29.0771450Z E1204 12:40:41.395000 342686 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0771638Z E1204 12:40:41.395000 342686 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0772070Z E1204 12:40:41.395000 342686 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T12:44:29.0772178Z E1204 12:40:41.395000 342686 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0772380Z E1204 12:40:41.395000 342686 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0772538Z E1204 12:40:41.395000 342686 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T12:44:29.0772667Z E1204 12:40:41.430000 342685 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.0772820Z E1204 12:40:41.430000 342685 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.0773099Z E1204 12:40:41.430000 342685 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0773244Z E1204 12:40:41.430000 342685 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.0773522Z E1204 12:40:41.430000 342685 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0773636Z E1204 12:40:41.430000 342685 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.0773931Z E1204 12:40:41.430000 342685 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0774073Z E1204 12:40:41.430000 342685 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0774341Z E1204 12:40:41.430000 342685 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0774480Z E1204 12:40:41.430000 342685 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0774748Z E1204 12:40:41.430000 342685 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0774879Z E1204 12:40:41.430000 342685 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.0775150Z E1204 12:40:41.430000 342685 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0775290Z E1204 12:40:41.430000 342685 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.0775809Z E1204 12:40:41.430000 342685 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 1254096896 and is now 3307208704.
2025-12-04T12:44:29.0775926Z E1204 12:40:41.430000 342685 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0776126Z E1204 12:40:41.430000 342685 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0776544Z E1204 12:40:41.430000 342685 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T12:44:29.0776653Z E1204 12:40:41.430000 342685 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0776854Z E1204 12:40:41.430000 342685 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0777009Z E1204 12:40:41.430000 342685 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T12:44:29.0777143Z E1204 12:40:41.432000 342684 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.0777294Z E1204 12:40:41.432000 342684 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.0777574Z E1204 12:40:41.432000 342684 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0777722Z E1204 12:40:41.432000 342684 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.0778001Z E1204 12:40:41.432000 342684 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0778138Z E1204 12:40:41.432000 342684 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.0778406Z E1204 12:40:41.432000 342684 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0778544Z E1204 12:40:41.432000 342684 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0778809Z E1204 12:40:41.432000 342684 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0778949Z E1204 12:40:41.432000 342684 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0779219Z E1204 12:40:41.432000 342684 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0779346Z E1204 12:40:41.432000 342684 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.0779645Z E1204 12:40:41.432000 342684 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0779788Z E1204 12:40:41.432000 342684 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.0780322Z E1204 12:40:41.432000 342684 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 1268776960 and is now 3307208704.
2025-12-04T12:44:29.0780443Z E1204 12:40:41.432000 342684 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0780632Z E1204 12:40:41.432000 342684 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0781051Z E1204 12:40:41.432000 342684 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T12:44:29.0781159Z E1204 12:40:41.432000 342684 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0781362Z E1204 12:40:41.432000 342684 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0781524Z E1204 12:40:41.432000 342684 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T12:44:29.0781570Z FAILED [10.1203s] [ 16%]
2025-12-04T12:44:29.0781572Z 
2025-12-04T12:44:29.0781627Z =================================== FAILURES ===================================
2025-12-04T12:44:29.0781776Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda _
2025-12-04T12:44:29.0781823Z Traceback (most recent call last):
2025-12-04T12:44:29.0781988Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T12:44:29.0782032Z     self._join_processes(fn)
2025-12-04T12:44:29.0782209Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T12:44:29.0782266Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T12:44:29.0782471Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T12:44:29.0782516Z     raise RuntimeError(error)
2025-12-04T12:44:29.0782597Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:44:29.0782643Z Traceback (most recent call last):
2025-12-04T12:44:29.0782805Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0782848Z     getattr(self, test_name)()
2025-12-04T12:44:29.0783010Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0783044Z     fn()
2025-12-04T12:44:29.0783199Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0783240Z     method(*args, **kwargs)
2025-12-04T12:44:29.0783394Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0783434Z     method(*args, **kwargs)
2025-12-04T12:44:29.0783587Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0783625Z     with policy():
2025-12-04T12:44:29.0783778Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0783832Z     raise RuntimeError(msg)
2025-12-04T12:44:29.0784232Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3472883712.
2025-12-04T12:44:29.0784251Z 
2025-12-04T12:44:29.0784333Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0784636Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T12:44:29.0784639Z 
2025-12-04T12:44:29.0784728Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0784730Z 
2025-12-04T12:44:29.0784790Z Process 3 exited with error code 10 and exception:
2025-12-04T12:44:29.0784837Z Traceback (most recent call last):
2025-12-04T12:44:29.0784999Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0785042Z     getattr(self, test_name)()
2025-12-04T12:44:29.0785202Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0785239Z     fn()
2025-12-04T12:44:29.0785393Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0785432Z     method(*args, **kwargs)
2025-12-04T12:44:29.0785583Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0785622Z     method(*args, **kwargs)
2025-12-04T12:44:29.0785775Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0785812Z     with policy():
2025-12-04T12:44:29.0785966Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0786006Z     raise RuntimeError(msg)
2025-12-04T12:44:29.0786431Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 1262485504 and is now 3307208704.
2025-12-04T12:44:29.0786435Z 
2025-12-04T12:44:29.0786509Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0786811Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T12:44:29.0786814Z 
2025-12-04T12:44:29.0786903Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0786905Z 
2025-12-04T12:44:29.0786907Z 
2025-12-04T12:44:29.0786983Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:44:29.0787072Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T12:44:29.0787351Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-b149cca1e6d1f159.xml -
2025-12-04T12:44:29.0787414Z =========================== short test summary info ============================
2025-12-04T12:44:29.0787727Z FAILED [10.1203s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:44:29.0787787Z Traceback (most recent call last):
2025-12-04T12:44:29.0787952Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0787995Z     getattr(self, test_name)()
2025-12-04T12:44:29.0788156Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0788202Z     fn()
2025-12-04T12:44:29.0788355Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0788398Z     method(*args, **kwargs)
2025-12-04T12:44:29.0788551Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0788590Z     method(*args, **kwargs)
2025-12-04T12:44:29.0788742Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0788780Z     with policy():
2025-12-04T12:44:29.0788933Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0788974Z     raise RuntimeError(msg)
2025-12-04T12:44:29.0789375Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3472883712.
2025-12-04T12:44:29.0789379Z 
2025-12-04T12:44:29.0789451Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0789800Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T12:44:29.0789803Z 
2025-12-04T12:44:29.0789888Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0789890Z 
2025-12-04T12:44:29.0789950Z Process 3 exited with error code 10 and exception:
2025-12-04T12:44:29.0789994Z Traceback (most recent call last):
2025-12-04T12:44:29.0790157Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0790227Z     getattr(self, test_name)()
2025-12-04T12:44:29.0790386Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0790422Z     fn()
2025-12-04T12:44:29.0790573Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0790615Z     method(*args, **kwargs)
2025-12-04T12:44:29.0790766Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0790806Z     method(*args, **kwargs)
2025-12-04T12:44:29.0790956Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0790995Z     with policy():
2025-12-04T12:44:29.0791147Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0791193Z     raise RuntimeError(msg)
2025-12-04T12:44:29.0791589Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 1262485504 and is now 3307208704.
2025-12-04T12:44:29.0791591Z 
2025-12-04T12:44:29.0791664Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0791976Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T12:44:29.0791980Z 
2025-12-04T12:44:29.0792065Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0792143Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:44:29.0792206Z ======================= 1 failed, 2 deselected in 10.13s =======================
2025-12-04T12:44:29.0792245Z Got exit code 1
2025-12-04T12:44:29.0792285Z Retrying single test...
2025-12-04T12:44:29.0792513Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-9b1a90df380fa266.xml
2025-12-04T12:44:29.0792571Z ============================= test session starts ==============================
2025-12-04T12:44:29.0792689Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:44:29.0792731Z cachedir: .pytest_cache
2025-12-04T12:44:29.0792890Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:44:29.0792935Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T12:44:29.0792978Z configfile: pytest.ini
2025-12-04T12:44:29.0793143Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T12:44:29.0793216Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T12:44:29.0793515Z stepcurrent: skipping 2 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T12:44:29.0793559Z Running 1 items in this shard
2025-12-04T12:44:29.0793562Z 
2025-12-04T12:44:29.0793936Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda I1204 12:40:45.488000 343084 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 343153
2025-12-04T12:44:29.0794094Z I1204 12:40:45.488000 343084 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 343154
2025-12-04T12:44:29.0794266Z I1204 12:40:45.489000 343084 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 343155
2025-12-04T12:44:29.0794416Z I1204 12:40:45.489000 343084 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 343156
2025-12-04T12:44:29.0795097Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.0795142Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.0795827Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.0795871Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.0796541Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.0796604Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.0797264Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.0797310Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.0797809Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.0797860Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.0798361Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.0798408Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.0798897Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.0798963Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.0799447Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.0799493Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.0799677Z E1204 12:40:54.214000 343156 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.0799833Z E1204 12:40:54.214000 343156 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.0800116Z E1204 12:40:54.214000 343156 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0800264Z E1204 12:40:54.214000 343156 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.0800543Z E1204 12:40:54.214000 343156 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0800658Z E1204 12:40:54.214000 343156 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.0800952Z E1204 12:40:54.214000 343156 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0801091Z E1204 12:40:54.214000 343156 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0801374Z E1204 12:40:54.214000 343156 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0801513Z E1204 12:40:54.214000 343156 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0801781Z E1204 12:40:54.214000 343156 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0801909Z E1204 12:40:54.214000 343156 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.0802181Z E1204 12:40:54.214000 343156 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0802325Z E1204 12:40:54.214000 343156 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.0802840Z E1204 12:40:54.214000 343156 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 1113587712 and is now 3491758080.
2025-12-04T12:44:29.0802951Z E1204 12:40:54.214000 343156 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0803143Z E1204 12:40:54.214000 343156 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0803595Z E1204 12:40:54.214000 343156 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T12:44:29.0803703Z E1204 12:40:54.214000 343156 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0803904Z E1204 12:40:54.214000 343156 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0804065Z E1204 12:40:54.214000 343156 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T12:44:29.0804194Z E1204 12:40:54.239000 343155 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.0804347Z E1204 12:40:54.239000 343155 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.0804626Z E1204 12:40:54.239000 343155 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0804774Z E1204 12:40:54.239000 343155 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.0805052Z E1204 12:40:54.239000 343155 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0805177Z E1204 12:40:54.239000 343155 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.0805444Z E1204 12:40:54.239000 343155 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0805595Z E1204 12:40:54.239000 343155 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0805864Z E1204 12:40:54.239000 343155 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0806001Z E1204 12:40:54.239000 343155 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0806268Z E1204 12:40:54.239000 343155 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0806395Z E1204 12:40:54.239000 343155 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.0806667Z E1204 12:40:54.239000 343155 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0806805Z E1204 12:40:54.239000 343155 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.0807320Z E1204 12:40:54.239000 343155 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 1266679808 and is now 3556769792.
2025-12-04T12:44:29.0807428Z E1204 12:40:54.239000 343155 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0807639Z E1204 12:40:54.239000 343155 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0808060Z E1204 12:40:54.239000 343155 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T12:44:29.0808165Z E1204 12:40:54.239000 343155 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0808370Z E1204 12:40:54.239000 343155 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0808530Z E1204 12:40:54.239000 343155 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T12:44:29.0808657Z E1204 12:40:54.723000 343153 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.0808813Z E1204 12:40:54.723000 343153 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.0809091Z E1204 12:40:54.723000 343153 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0809235Z E1204 12:40:54.723000 343153 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.0809522Z E1204 12:40:54.723000 343153 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0809671Z E1204 12:40:54.723000 343153 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.0809954Z E1204 12:40:54.723000 343153 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0810094Z E1204 12:40:54.723000 343153 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0810362Z E1204 12:40:54.723000 343153 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0810503Z E1204 12:40:54.723000 343153 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0810771Z E1204 12:40:54.723000 343153 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0810899Z E1204 12:40:54.723000 343153 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.0811169Z E1204 12:40:54.723000 343153 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0811307Z E1204 12:40:54.723000 343153 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.0811823Z E1204 12:40:54.723000 343153 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3468689408.
2025-12-04T12:44:29.0811958Z E1204 12:40:54.723000 343153 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0812145Z E1204 12:40:54.723000 343153 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0812566Z E1204 12:40:54.723000 343153 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T12:44:29.0812673Z E1204 12:40:54.723000 343153 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0812875Z E1204 12:40:54.723000 343153 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0813034Z E1204 12:40:54.723000 343153 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T12:44:29.0813164Z E1204 12:40:54.766000 343154 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.0813317Z E1204 12:40:54.766000 343154 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.0813594Z E1204 12:40:54.766000 343154 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0813755Z E1204 12:40:54.766000 343154 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.0814029Z E1204 12:40:54.766000 343154 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0814157Z E1204 12:40:54.766000 343154 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.0814424Z E1204 12:40:54.766000 343154 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0814565Z E1204 12:40:54.766000 343154 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0814840Z E1204 12:40:54.766000 343154 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0814976Z E1204 12:40:54.766000 343154 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0815248Z E1204 12:40:54.766000 343154 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0815374Z E1204 12:40:54.766000 343154 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.0815643Z E1204 12:40:54.766000 343154 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0815781Z E1204 12:40:54.766000 343154 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.0816325Z E1204 12:40:54.766000 343154 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 1268776960 and is now 3307208704.
2025-12-04T12:44:29.0816435Z E1204 12:40:54.766000 343154 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0816625Z E1204 12:40:54.766000 343154 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0817047Z E1204 12:40:54.766000 343154 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T12:44:29.0817153Z E1204 12:40:54.766000 343154 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0817359Z E1204 12:40:54.766000 343154 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0817517Z E1204 12:40:54.766000 343154 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T12:44:29.0817561Z FAILED [10.1212s] [100%]
2025-12-04T12:44:29.0817563Z 
2025-12-04T12:44:29.0817618Z =================================== FAILURES ===================================
2025-12-04T12:44:29.0817769Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda _
2025-12-04T12:44:29.0817826Z Traceback (most recent call last):
2025-12-04T12:44:29.0817988Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T12:44:29.0818034Z     self._join_processes(fn)
2025-12-04T12:44:29.0818206Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T12:44:29.0818272Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T12:44:29.0818451Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T12:44:29.0818497Z     raise RuntimeError(error)
2025-12-04T12:44:29.0818576Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T12:44:29.0818624Z Traceback (most recent call last):
2025-12-04T12:44:29.0818788Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0818833Z     getattr(self, test_name)()
2025-12-04T12:44:29.0818993Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0819030Z     fn()
2025-12-04T12:44:29.0819181Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0819225Z     method(*args, **kwargs)
2025-12-04T12:44:29.0819374Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0819415Z     method(*args, **kwargs)
2025-12-04T12:44:29.0819563Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0819636Z     with policy():
2025-12-04T12:44:29.0819790Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0819832Z     raise RuntimeError(msg)
2025-12-04T12:44:29.0820226Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 1113587712 and is now 3491758080.
2025-12-04T12:44:29.0820230Z 
2025-12-04T12:44:29.0820330Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0820633Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T12:44:29.0820636Z 
2025-12-04T12:44:29.0820721Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0820724Z 
2025-12-04T12:44:29.0820726Z 
2025-12-04T12:44:29.0820801Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:44:29.0820886Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T12:44:29.0821159Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-9b1a90df380fa266.xml -
2025-12-04T12:44:29.0821221Z =========================== short test summary info ============================
2025-12-04T12:44:29.0821537Z FAILED [10.1212s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T12:44:29.0821585Z Traceback (most recent call last):
2025-12-04T12:44:29.0821750Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0821810Z     getattr(self, test_name)()
2025-12-04T12:44:29.0821969Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0822007Z     fn()
2025-12-04T12:44:29.0822159Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0822216Z     method(*args, **kwargs)
2025-12-04T12:44:29.0822368Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0822412Z     method(*args, **kwargs)
2025-12-04T12:44:29.0822564Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0822604Z     with policy():
2025-12-04T12:44:29.0822756Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0822799Z     raise RuntimeError(msg)
2025-12-04T12:44:29.0823196Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 1113587712 and is now 3491758080.
2025-12-04T12:44:29.0823199Z 
2025-12-04T12:44:29.0823277Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0823588Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T12:44:29.0823590Z 
2025-12-04T12:44:29.0823677Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0823744Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:44:29.0823807Z ======================= 1 failed, 7 deselected in 10.13s =======================
2025-12-04T12:44:29.0823847Z Got exit code 1
2025-12-04T12:44:29.0823889Z Retrying single test...
2025-12-04T12:44:29.0824117Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-88af6fa056c2577f.xml
2025-12-04T12:44:29.0824199Z ============================= test session starts ==============================
2025-12-04T12:44:29.0824313Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:44:29.0824353Z cachedir: .pytest_cache
2025-12-04T12:44:29.0824513Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:44:29.0824559Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T12:44:29.0824603Z configfile: pytest.ini
2025-12-04T12:44:29.0824766Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T12:44:29.0824839Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T12:44:29.0825132Z stepcurrent: skipping 2 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T12:44:29.0825178Z Running 1 items in this shard
2025-12-04T12:44:29.0825181Z 
2025-12-04T12:44:29.0825555Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda I1204 12:40:58.356000 343554 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 343623
2025-12-04T12:44:29.0825711Z I1204 12:40:58.356000 343554 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 343624
2025-12-04T12:44:29.0825874Z I1204 12:40:58.357000 343554 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 343625
2025-12-04T12:44:29.0826024Z I1204 12:40:58.358000 343554 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 343626
2025-12-04T12:44:29.0826715Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.0826757Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.0827427Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.0827472Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.0828135Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.0828178Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.0828856Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.0828899Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.0829395Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.0829443Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.0829973Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.0830021Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.0830508Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.0830570Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.0831056Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.0831116Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.0831251Z E1204 12:41:07.063000 343626 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.0831404Z E1204 12:41:07.063000 343626 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.0831688Z E1204 12:41:07.063000 343626 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0831835Z E1204 12:41:07.063000 343626 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.0832116Z E1204 12:41:07.063000 343626 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0832231Z E1204 12:41:07.063000 343626 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.0832502Z E1204 12:41:07.063000 343626 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0832641Z E1204 12:41:07.063000 343626 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0832909Z E1204 12:41:07.063000 343626 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0833048Z E1204 12:41:07.063000 343626 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0833347Z E1204 12:41:07.063000 343626 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0833476Z E1204 12:41:07.063000 343626 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.0833749Z E1204 12:41:07.063000 343626 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0833891Z E1204 12:41:07.063000 343626 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.0834409Z E1204 12:41:07.063000 343626 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 1268776960 and is now 3307208704.
2025-12-04T12:44:29.0834518Z E1204 12:41:07.063000 343626 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0834706Z E1204 12:41:07.063000 343626 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0835130Z E1204 12:41:07.063000 343626 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T12:44:29.0835252Z E1204 12:41:07.063000 343626 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0835468Z E1204 12:41:07.063000 343626 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0835626Z E1204 12:41:07.063000 343626 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T12:44:29.0835754Z E1204 12:41:07.137000 343624 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.0835906Z E1204 12:41:07.137000 343624 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.0836186Z E1204 12:41:07.137000 343624 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0836331Z E1204 12:41:07.137000 343624 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.0836611Z E1204 12:41:07.137000 343624 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0836725Z E1204 12:41:07.137000 343624 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.0836995Z E1204 12:41:07.137000 343624 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0837135Z E1204 12:41:07.137000 343624 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0837407Z E1204 12:41:07.137000 343624 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0837566Z E1204 12:41:07.137000 343624 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0837837Z E1204 12:41:07.137000 343624 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0837964Z E1204 12:41:07.137000 343624 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.0838238Z E1204 12:41:07.137000 343624 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0838379Z E1204 12:41:07.137000 343624 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.0838898Z E1204 12:41:07.137000 343624 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 1268776960 and is now 3307208704.
2025-12-04T12:44:29.0839006Z E1204 12:41:07.137000 343624 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0839194Z E1204 12:41:07.137000 343624 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0839660Z E1204 12:41:07.137000 343624 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T12:44:29.0839782Z E1204 12:41:07.137000 343624 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0839985Z E1204 12:41:07.137000 343624 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0840143Z E1204 12:41:07.137000 343624 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T12:44:29.0840271Z E1204 12:41:07.137000 343623 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.0840423Z E1204 12:41:07.137000 343623 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.0840701Z E1204 12:41:07.137000 343623 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0840852Z E1204 12:41:07.137000 343623 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.0841131Z E1204 12:41:07.137000 343623 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0841245Z E1204 12:41:07.137000 343623 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.0841516Z E1204 12:41:07.137000 343623 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0841654Z E1204 12:41:07.137000 343623 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0841945Z E1204 12:41:07.137000 343623 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0842083Z E1204 12:41:07.137000 343623 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0842353Z E1204 12:41:07.137000 343623 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0842480Z E1204 12:41:07.137000 343623 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.0842750Z E1204 12:41:07.137000 343623 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0842893Z E1204 12:41:07.137000 343623 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.0843409Z E1204 12:41:07.137000 343623 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3468689408.
2025-12-04T12:44:29.0843529Z E1204 12:41:07.137000 343623 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0843716Z E1204 12:41:07.137000 343623 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0844137Z E1204 12:41:07.137000 343623 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T12:44:29.0844253Z E1204 12:41:07.137000 343623 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0844456Z E1204 12:41:07.137000 343623 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0844614Z E1204 12:41:07.137000 343623 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T12:44:29.0844743Z E1204 12:41:07.142000 343625 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.0844895Z E1204 12:41:07.142000 343625 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.0845175Z E1204 12:41:07.142000 343625 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0845322Z E1204 12:41:07.142000 343625 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.0845598Z E1204 12:41:07.142000 343625 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0845714Z E1204 12:41:07.142000 343625 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.0845984Z E1204 12:41:07.142000 343625 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0846146Z E1204 12:41:07.142000 343625 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0846415Z E1204 12:41:07.142000 343625 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0846552Z E1204 12:41:07.142000 343625 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0846824Z E1204 12:41:07.142000 343625 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0846951Z E1204 12:41:07.142000 343625 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.0847222Z E1204 12:41:07.142000 343625 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0847361Z E1204 12:41:07.142000 343625 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.0847876Z E1204 12:41:07.142000 343625 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 1268776960 and is now 3307208704.
2025-12-04T12:44:29.0847992Z E1204 12:41:07.142000 343625 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0848181Z E1204 12:41:07.142000 343625 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0848619Z E1204 12:41:07.142000 343625 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T12:44:29.0848724Z E1204 12:41:07.142000 343625 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0848927Z E1204 12:41:07.142000 343625 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0849085Z E1204 12:41:07.142000 343625 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T12:44:29.0849129Z FAILED [10.0211s] [100%]
2025-12-04T12:44:29.0849130Z 
2025-12-04T12:44:29.0849189Z =================================== FAILURES ===================================
2025-12-04T12:44:29.0849338Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda _
2025-12-04T12:44:29.0849386Z Traceback (most recent call last):
2025-12-04T12:44:29.0849548Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T12:44:29.0849628Z     self._join_processes(fn)
2025-12-04T12:44:29.0849802Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T12:44:29.0849858Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T12:44:29.0850038Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T12:44:29.0850083Z     raise RuntimeError(error)
2025-12-04T12:44:29.0850162Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T12:44:29.0850208Z Traceback (most recent call last):
2025-12-04T12:44:29.0850398Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0850443Z     getattr(self, test_name)()
2025-12-04T12:44:29.0850602Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0850637Z     fn()
2025-12-04T12:44:29.0850789Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0850831Z     method(*args, **kwargs)
2025-12-04T12:44:29.0850980Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0851021Z     method(*args, **kwargs)
2025-12-04T12:44:29.0851171Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0851211Z     with policy():
2025-12-04T12:44:29.0851367Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0851409Z     raise RuntimeError(msg)
2025-12-04T12:44:29.0851812Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 1268776960 and is now 3307208704.
2025-12-04T12:44:29.0851829Z 
2025-12-04T12:44:29.0851904Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0852207Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T12:44:29.0852221Z 
2025-12-04T12:44:29.0852309Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0852311Z 
2025-12-04T12:44:29.0852313Z 
2025-12-04T12:44:29.0852391Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:44:29.0852479Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T12:44:29.0852752Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-88af6fa056c2577f.xml -
2025-12-04T12:44:29.0852814Z =========================== short test summary info ============================
2025-12-04T12:44:29.0853127Z FAILED [10.0211s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T12:44:29.0853177Z Traceback (most recent call last):
2025-12-04T12:44:29.0853342Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0853387Z     getattr(self, test_name)()
2025-12-04T12:44:29.0853549Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0853585Z     fn()
2025-12-04T12:44:29.0853736Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0853778Z     method(*args, **kwargs)
2025-12-04T12:44:29.0853930Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0853971Z     method(*args, **kwargs)
2025-12-04T12:44:29.0854122Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0854160Z     with policy():
2025-12-04T12:44:29.0854330Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0854373Z     raise RuntimeError(msg)
2025-12-04T12:44:29.0854770Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 1268776960 and is now 3307208704.
2025-12-04T12:44:29.0854775Z 
2025-12-04T12:44:29.0854850Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0855152Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T12:44:29.0855155Z 
2025-12-04T12:44:29.0855243Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0855306Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:44:29.0855365Z ======================= 1 failed, 7 deselected in 10.03s =======================
2025-12-04T12:44:29.0855405Z Got exit code 1
2025-12-04T12:44:29.0855658Z FAILED CONSISTENTLY: test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda
2025-12-04T12:44:29.0855798Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:44:29.0856025Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-3dbaaa6d6cb49267.xml
2025-12-04T12:44:29.0856085Z ============================= test session starts ==============================
2025-12-04T12:44:29.0856212Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:44:29.0856256Z cachedir: .pytest_cache
2025-12-04T12:44:29.0856416Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:44:29.0856462Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T12:44:29.0856503Z configfile: pytest.ini
2025-12-04T12:44:29.0856664Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T12:44:29.0856737Z collecting ... collected 8 items / 3 deselected / 5 selected
2025-12-04T12:44:29.0856790Z stepcurrent: skipping 3 already run items.
2025-12-04T12:44:29.0856836Z Running 5 items in this shard
2025-12-04T12:44:29.0856838Z 
2025-12-04T12:44:29.0857212Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda I1204 12:41:10.978000 344024 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 344093
2025-12-04T12:44:29.0857369Z I1204 12:41:10.979000 344024 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 344094
2025-12-04T12:44:29.0857520Z I1204 12:41:10.979000 344024 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 344095
2025-12-04T12:44:29.0857674Z I1204 12:41:10.980000 344024 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 344096
2025-12-04T12:44:29.0858372Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.0858417Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.0859085Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.0859127Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.0859820Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.0859863Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.0860521Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.0860581Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.0861089Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.0861140Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.0861629Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.0861677Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.0862168Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.0862214Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.0862709Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.0862755Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.0862889Z E1204 12:41:19.879000 344093 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.0863069Z E1204 12:41:19.879000 344093 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.0863350Z E1204 12:41:19.879000 344093 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0863498Z E1204 12:41:19.879000 344093 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.0863778Z E1204 12:41:19.879000 344093 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0863893Z E1204 12:41:19.879000 344093 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.0864164Z E1204 12:41:19.879000 344093 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0864306Z E1204 12:41:19.879000 344093 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0864575Z E1204 12:41:19.879000 344093 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0864731Z E1204 12:41:19.879000 344093 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0865000Z E1204 12:41:19.879000 344093 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0865141Z E1204 12:41:19.879000 344093 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.0865412Z E1204 12:41:19.879000 344093 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0865550Z E1204 12:41:19.879000 344093 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.0866068Z E1204 12:41:19.879000 344093 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3449815040.
2025-12-04T12:44:29.0866179Z E1204 12:41:19.879000 344093 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0866371Z E1204 12:41:19.879000 344093 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0866795Z E1204 12:41:19.879000 344093 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T12:44:29.0866902Z E1204 12:41:19.879000 344093 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0867103Z E1204 12:41:19.879000 344093 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0867260Z E1204 12:41:19.879000 344093 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T12:44:29.0867411Z E1204 12:41:19.936000 344095 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.0867562Z E1204 12:41:19.936000 344095 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.0867841Z E1204 12:41:19.936000 344095 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0867989Z E1204 12:41:19.936000 344095 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.0868268Z E1204 12:41:19.936000 344095 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0868388Z E1204 12:41:19.936000 344095 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.0868657Z E1204 12:41:19.936000 344095 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0868798Z E1204 12:41:19.936000 344095 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0869065Z E1204 12:41:19.936000 344095 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0869214Z E1204 12:41:19.936000 344095 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0869485Z E1204 12:41:19.936000 344095 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0869687Z E1204 12:41:19.936000 344095 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.0869961Z E1204 12:41:19.936000 344095 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0870098Z E1204 12:41:19.936000 344095 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.0870614Z E1204 12:41:19.936000 344095 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 1262485504 and is now 3284140032.
2025-12-04T12:44:29.0870723Z E1204 12:41:19.936000 344095 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0870914Z E1204 12:41:19.936000 344095 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0871333Z E1204 12:41:19.936000 344095 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T12:44:29.0871439Z E1204 12:41:19.936000 344095 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0871643Z E1204 12:41:19.936000 344095 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0872356Z E1204 12:41:19.936000 344095 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T12:44:29.0872487Z E1204 12:41:19.937000 344096 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.0872638Z E1204 12:41:19.937000 344096 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.0872918Z E1204 12:41:19.937000 344096 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0873064Z E1204 12:41:19.937000 344096 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.0873346Z E1204 12:41:19.937000 344096 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0873462Z E1204 12:41:19.937000 344096 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.0873731Z E1204 12:41:19.937000 344096 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0873870Z E1204 12:41:19.937000 344096 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0874153Z E1204 12:41:19.937000 344096 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0874292Z E1204 12:41:19.937000 344096 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0874576Z E1204 12:41:19.937000 344096 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0874703Z E1204 12:41:19.937000 344096 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.0874972Z E1204 12:41:19.937000 344096 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0875112Z E1204 12:41:19.937000 344096 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.0875627Z E1204 12:41:19.937000 344096 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 1268776960 and is now 3284140032.
2025-12-04T12:44:29.0875734Z E1204 12:41:19.937000 344096 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0875921Z E1204 12:41:19.937000 344096 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0876340Z E1204 12:41:19.937000 344096 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T12:44:29.0876446Z E1204 12:41:19.937000 344096 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0876669Z E1204 12:41:19.937000 344096 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0876826Z E1204 12:41:19.937000 344096 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T12:44:29.0876955Z E1204 12:41:19.950000 344094 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.0877107Z E1204 12:41:19.950000 344094 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.0877388Z E1204 12:41:19.950000 344094 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0877535Z E1204 12:41:19.950000 344094 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.0877815Z E1204 12:41:19.950000 344094 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0877930Z E1204 12:41:19.950000 344094 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.0878201Z E1204 12:41:19.950000 344094 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0878354Z E1204 12:41:19.950000 344094 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0878622Z E1204 12:41:19.950000 344094 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0878771Z E1204 12:41:19.950000 344094 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0879042Z E1204 12:41:19.950000 344094 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0879169Z E1204 12:41:19.950000 344094 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.0879441Z E1204 12:41:19.950000 344094 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0879622Z E1204 12:41:19.950000 344094 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.0880137Z E1204 12:41:19.950000 344094 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 1268776960 and is now 3284140032.
2025-12-04T12:44:29.0880243Z E1204 12:41:19.950000 344094 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0880434Z E1204 12:41:19.950000 344094 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0880851Z E1204 12:41:19.950000 344094 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T12:44:29.0880992Z E1204 12:41:19.950000 344094 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0881197Z E1204 12:41:19.950000 344094 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0881352Z E1204 12:41:19.950000 344094 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T12:44:29.0881394Z FAILED [10.2207s] [ 20%]
2025-12-04T12:44:29.0881397Z 
2025-12-04T12:44:29.0881452Z =================================== FAILURES ===================================
2025-12-04T12:44:29.0881599Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda _
2025-12-04T12:44:29.0881645Z Traceback (most recent call last):
2025-12-04T12:44:29.0881809Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T12:44:29.0881854Z     self._join_processes(fn)
2025-12-04T12:44:29.0882028Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T12:44:29.0882081Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T12:44:29.0882259Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T12:44:29.0882321Z     raise RuntimeError(error)
2025-12-04T12:44:29.0882402Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:44:29.0882446Z Traceback (most recent call last):
2025-12-04T12:44:29.0882609Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0882650Z     getattr(self, test_name)()
2025-12-04T12:44:29.0882824Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0882858Z     fn()
2025-12-04T12:44:29.0883012Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0883051Z     method(*args, **kwargs)
2025-12-04T12:44:29.0883205Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0883245Z     method(*args, **kwargs)
2025-12-04T12:44:29.0883395Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0883434Z     with policy():
2025-12-04T12:44:29.0883588Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0883631Z     raise RuntimeError(msg)
2025-12-04T12:44:29.0884027Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3449815040.
2025-12-04T12:44:29.0884030Z 
2025-12-04T12:44:29.0884108Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0884409Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T12:44:29.0884412Z 
2025-12-04T12:44:29.0884501Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0884504Z 
2025-12-04T12:44:29.0884505Z 
2025-12-04T12:44:29.0884581Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:44:29.0884668Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T12:44:29.0884962Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-3dbaaa6d6cb49267.xml -
2025-12-04T12:44:29.0885023Z =========================== short test summary info ============================
2025-12-04T12:44:29.0885337Z FAILED [10.2207s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:44:29.0885384Z Traceback (most recent call last):
2025-12-04T12:44:29.0885550Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0885592Z     getattr(self, test_name)()
2025-12-04T12:44:29.0885755Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0885790Z     fn()
2025-12-04T12:44:29.0885941Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0885981Z     method(*args, **kwargs)
2025-12-04T12:44:29.0886134Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0886173Z     method(*args, **kwargs)
2025-12-04T12:44:29.0886334Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0886371Z     with policy():
2025-12-04T12:44:29.0886524Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0886564Z     raise RuntimeError(msg)
2025-12-04T12:44:29.0886971Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3449815040.
2025-12-04T12:44:29.0886973Z 
2025-12-04T12:44:29.0887048Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0887352Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T12:44:29.0887355Z 
2025-12-04T12:44:29.0887442Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0887504Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:44:29.0887566Z ======================= 1 failed, 3 deselected in 10.23s =======================
2025-12-04T12:44:29.0887603Z Got exit code 1
2025-12-04T12:44:29.0887646Z Retrying single test...
2025-12-04T12:44:29.0887870Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-015a285330bddf1a.xml
2025-12-04T12:44:29.0887928Z ============================= test session starts ==============================
2025-12-04T12:44:29.0888041Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:44:29.0888084Z cachedir: .pytest_cache
2025-12-04T12:44:29.0888243Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:44:29.0888288Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T12:44:29.0888329Z configfile: pytest.ini
2025-12-04T12:44:29.0888492Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T12:44:29.0888589Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T12:44:29.0888882Z stepcurrent: skipping 3 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T12:44:29.0888927Z Running 1 items in this shard
2025-12-04T12:44:29.0888929Z 
2025-12-04T12:44:29.0889302Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda I1204 12:41:23.999000 344494 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 344563
2025-12-04T12:44:29.0889460Z I1204 12:41:24.000000 344494 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 344564
2025-12-04T12:44:29.0889647Z I1204 12:41:24.001000 344494 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 344565
2025-12-04T12:44:29.0889801Z I1204 12:41:24.001000 344494 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 344566
2025-12-04T12:44:29.0890488Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.0890547Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.0891217Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.0891274Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.0891941Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.0891985Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.0892649Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.0892692Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.0893183Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.0893233Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.0893746Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.0893793Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.0894276Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.0894322Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.0894816Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.0894863Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.0894996Z E1204 12:41:32.837000 344566 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.0895159Z E1204 12:41:32.837000 344566 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.0895440Z E1204 12:41:32.837000 344566 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0895598Z E1204 12:41:32.837000 344566 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.0895875Z E1204 12:41:32.837000 344566 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0895991Z E1204 12:41:32.837000 344566 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.0896260Z E1204 12:41:32.837000 344566 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0896401Z E1204 12:41:32.837000 344566 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0896672Z E1204 12:41:32.837000 344566 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0896812Z E1204 12:41:32.837000 344566 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0897079Z E1204 12:41:32.837000 344566 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0897206Z E1204 12:41:32.837000 344566 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.0897479Z E1204 12:41:32.837000 344566 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0897617Z E1204 12:41:32.837000 344566 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.0898157Z E1204 12:41:32.837000 344566 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 1268776960 and is now 3284140032.
2025-12-04T12:44:29.0898266Z E1204 12:41:32.837000 344566 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0898457Z E1204 12:41:32.837000 344566 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0898880Z E1204 12:41:32.837000 344566 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T12:44:29.0898986Z E1204 12:41:32.837000 344566 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0899190Z E1204 12:41:32.837000 344566 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0899346Z E1204 12:41:32.837000 344566 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T12:44:29.0899487Z E1204 12:41:32.844000 344565 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.0899683Z E1204 12:41:32.844000 344565 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.0899963Z E1204 12:41:32.844000 344565 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0900123Z E1204 12:41:32.844000 344565 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.0900397Z E1204 12:41:32.844000 344565 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0900514Z E1204 12:41:32.844000 344565 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.0900782Z E1204 12:41:32.844000 344565 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0900921Z E1204 12:41:32.844000 344565 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0901190Z E1204 12:41:32.844000 344565 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0901329Z E1204 12:41:32.844000 344565 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0901596Z E1204 12:41:32.844000 344565 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0901724Z E1204 12:41:32.844000 344565 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.0901992Z E1204 12:41:32.844000 344565 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0902155Z E1204 12:41:32.844000 344565 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.0902669Z E1204 12:41:32.844000 344565 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 1268776960 and is now 3284140032.
2025-12-04T12:44:29.0902778Z E1204 12:41:32.844000 344565 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0902966Z E1204 12:41:32.844000 344565 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0903389Z E1204 12:41:32.844000 344565 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T12:44:29.0903495Z E1204 12:41:32.844000 344565 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0903696Z E1204 12:41:32.844000 344565 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0903867Z E1204 12:41:32.844000 344565 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T12:44:29.0903996Z E1204 12:41:32.895000 344564 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.0904145Z E1204 12:41:32.895000 344564 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.0904434Z E1204 12:41:32.895000 344564 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0904580Z E1204 12:41:32.895000 344564 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.0904857Z E1204 12:41:32.895000 344564 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0904972Z E1204 12:41:32.895000 344564 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.0905239Z E1204 12:41:32.895000 344564 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0905381Z E1204 12:41:32.895000 344564 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0905648Z E1204 12:41:32.895000 344564 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0905788Z E1204 12:41:32.895000 344564 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0906057Z E1204 12:41:32.895000 344564 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0906183Z E1204 12:41:32.895000 344564 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.0906477Z E1204 12:41:32.895000 344564 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0906616Z E1204 12:41:32.895000 344564 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.0907125Z E1204 12:41:32.895000 344564 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 1268776960 and is now 3284140032.
2025-12-04T12:44:29.0907233Z E1204 12:41:32.895000 344564 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0907424Z E1204 12:41:32.895000 344564 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0907845Z E1204 12:41:32.895000 344564 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T12:44:29.0907950Z E1204 12:41:32.895000 344564 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0908161Z E1204 12:41:32.895000 344564 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0908318Z E1204 12:41:32.895000 344564 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T12:44:29.0908457Z E1204 12:41:32.906000 344563 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.0908609Z E1204 12:41:32.906000 344563 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.0908888Z E1204 12:41:32.906000 344563 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0909031Z E1204 12:41:32.906000 344563 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.0909308Z E1204 12:41:32.906000 344563 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0909423Z E1204 12:41:32.906000 344563 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.0909735Z E1204 12:41:32.906000 344563 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0909874Z E1204 12:41:32.906000 344563 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0910149Z E1204 12:41:32.906000 344563 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0910289Z E1204 12:41:32.906000 344563 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0910556Z E1204 12:41:32.906000 344563 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0910709Z E1204 12:41:32.906000 344563 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.0910980Z E1204 12:41:32.906000 344563 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0911117Z E1204 12:41:32.906000 344563 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.0911628Z E1204 12:41:32.906000 344563 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3443523584.
2025-12-04T12:44:29.0911736Z E1204 12:41:32.906000 344563 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0911925Z E1204 12:41:32.906000 344563 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0912352Z E1204 12:41:32.906000 344563 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T12:44:29.0912473Z E1204 12:41:32.906000 344563 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0912673Z E1204 12:41:32.906000 344563 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0912851Z E1204 12:41:32.906000 344563 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T12:44:29.0912893Z FAILED [10.1198s] [100%]
2025-12-04T12:44:29.0912895Z 
2025-12-04T12:44:29.0912950Z =================================== FAILURES ===================================
2025-12-04T12:44:29.0913096Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda _
2025-12-04T12:44:29.0913141Z Traceback (most recent call last):
2025-12-04T12:44:29.0913305Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T12:44:29.0913349Z     self._join_processes(fn)
2025-12-04T12:44:29.0913522Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T12:44:29.0913575Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T12:44:29.0913757Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T12:44:29.0913800Z     raise RuntimeError(error)
2025-12-04T12:44:29.0913880Z RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T12:44:29.0913925Z Traceback (most recent call last):
2025-12-04T12:44:29.0914089Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0914132Z     getattr(self, test_name)()
2025-12-04T12:44:29.0914292Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0914327Z     fn()
2025-12-04T12:44:29.0914481Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0914521Z     method(*args, **kwargs)
2025-12-04T12:44:29.0914674Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0914734Z     method(*args, **kwargs)
2025-12-04T12:44:29.0914885Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0914924Z     with policy():
2025-12-04T12:44:29.0915076Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0915117Z     raise RuntimeError(msg)
2025-12-04T12:44:29.0915511Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 1268776960 and is now 3284140032.
2025-12-04T12:44:29.0915514Z 
2025-12-04T12:44:29.0915590Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0915892Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T12:44:29.0915895Z 
2025-12-04T12:44:29.0915982Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0915984Z 
2025-12-04T12:44:29.0915986Z 
2025-12-04T12:44:29.0916060Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:44:29.0916156Z Process 2 terminated with exit code 10, terminating remaining processes.
2025-12-04T12:44:29.0916425Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-015a285330bddf1a.xml -
2025-12-04T12:44:29.0916484Z =========================== short test summary info ============================
2025-12-04T12:44:29.0916807Z FAILED [10.1198s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda - RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T12:44:29.0916853Z Traceback (most recent call last):
2025-12-04T12:44:29.0917020Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0917062Z     getattr(self, test_name)()
2025-12-04T12:44:29.0917223Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0917258Z     fn()
2025-12-04T12:44:29.0917411Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0917450Z     method(*args, **kwargs)
2025-12-04T12:44:29.0917602Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0917644Z     method(*args, **kwargs)
2025-12-04T12:44:29.0917795Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0917833Z     with policy():
2025-12-04T12:44:29.0917983Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0918024Z     raise RuntimeError(msg)
2025-12-04T12:44:29.0918421Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 1268776960 and is now 3284140032.
2025-12-04T12:44:29.0918424Z 
2025-12-04T12:44:29.0918499Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0918818Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T12:44:29.0918820Z 
2025-12-04T12:44:29.0918908Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0918970Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:44:29.0919033Z ======================= 1 failed, 7 deselected in 10.13s =======================
2025-12-04T12:44:29.0919071Z Got exit code 1
2025-12-04T12:44:29.0919111Z Retrying single test...
2025-12-04T12:44:29.0919338Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-01fc5e09ab395607.xml
2025-12-04T12:44:29.0919397Z ============================= test session starts ==============================
2025-12-04T12:44:29.0919512Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:44:29.0919553Z cachedir: .pytest_cache
2025-12-04T12:44:29.0919748Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:44:29.0919793Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T12:44:29.0919834Z configfile: pytest.ini
2025-12-04T12:44:29.0919996Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T12:44:29.0920083Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T12:44:29.0920376Z stepcurrent: skipping 3 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T12:44:29.0920419Z Running 1 items in this shard
2025-12-04T12:44:29.0920435Z 
2025-12-04T12:44:29.0920808Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda I1204 12:41:36.726000 344964 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 345033
2025-12-04T12:44:29.0920964Z I1204 12:41:36.726000 344964 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 345034
2025-12-04T12:44:29.0921117Z I1204 12:41:36.727000 344964 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 345035
2025-12-04T12:44:29.0921270Z I1204 12:41:36.727000 344964 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 345036
2025-12-04T12:44:29.0921953Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.0921996Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.0922660Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.0922702Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.0923390Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.0923434Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.0924097Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.0924141Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.0924640Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.0924688Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.0925189Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.0925245Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.0925734Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.0925780Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.0926266Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.0926313Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.0926449Z E1204 12:41:45.500000 345034 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.0926603Z E1204 12:41:45.500000 345034 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.0926891Z E1204 12:41:45.500000 345034 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0927038Z E1204 12:41:45.500000 345034 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.0927315Z E1204 12:41:45.500000 345034 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0927434Z E1204 12:41:45.500000 345034 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.0927721Z E1204 12:41:45.500000 345034 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0927861Z E1204 12:41:45.500000 345034 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0928131Z E1204 12:41:45.500000 345034 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0928270Z E1204 12:41:45.500000 345034 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0928541Z E1204 12:41:45.500000 345034 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0928671Z E1204 12:41:45.500000 345034 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.0928946Z E1204 12:41:45.500000 345034 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0929085Z E1204 12:41:45.500000 345034 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.0929651Z E1204 12:41:45.500000 345034 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 1268776960 and is now 3424649216.
2025-12-04T12:44:29.0929776Z E1204 12:41:45.500000 345034 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0929965Z E1204 12:41:45.500000 345034 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0930385Z E1204 12:41:45.500000 345034 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T12:44:29.0930493Z E1204 12:41:45.500000 345034 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0930696Z E1204 12:41:45.500000 345034 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0930856Z E1204 12:41:45.500000 345034 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T12:44:29.0930984Z E1204 12:41:45.507000 345035 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.0931138Z E1204 12:41:45.507000 345035 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.0931416Z E1204 12:41:45.507000 345035 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0931563Z E1204 12:41:45.507000 345035 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.0931872Z E1204 12:41:45.507000 345035 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0931988Z E1204 12:41:45.507000 345035 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.0932255Z E1204 12:41:45.507000 345035 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0932393Z E1204 12:41:45.507000 345035 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0932662Z E1204 12:41:45.507000 345035 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0932799Z E1204 12:41:45.507000 345035 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0933070Z E1204 12:41:45.507000 345035 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0933197Z E1204 12:41:45.507000 345035 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.0933469Z E1204 12:41:45.507000 345035 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0933620Z E1204 12:41:45.507000 345035 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.0934135Z E1204 12:41:45.507000 345035 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 1268776960 and is now 3368026112.
2025-12-04T12:44:29.0934253Z E1204 12:41:45.507000 345035 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0934441Z E1204 12:41:45.507000 345035 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0934860Z E1204 12:41:45.507000 345035 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T12:44:29.0934967Z E1204 12:41:45.507000 345035 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0935173Z E1204 12:41:45.507000 345035 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0935329Z E1204 12:41:45.507000 345035 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T12:44:29.0935458Z E1204 12:41:45.531000 345036 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.0935609Z E1204 12:41:45.531000 345036 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.0935892Z E1204 12:41:45.531000 345036 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0936037Z E1204 12:41:45.531000 345036 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.0936332Z E1204 12:41:45.531000 345036 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0936447Z E1204 12:41:45.531000 345036 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.0936716Z E1204 12:41:45.531000 345036 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0936857Z E1204 12:41:45.531000 345036 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0937125Z E1204 12:41:45.531000 345036 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0937265Z E1204 12:41:45.531000 345036 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0937534Z E1204 12:41:45.531000 345036 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0937660Z E1204 12:41:45.531000 345036 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.0937943Z E1204 12:41:45.531000 345036 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0938082Z E1204 12:41:45.531000 345036 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.0938605Z E1204 12:41:45.531000 345036 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 1254096896 and is now 3368026112.
2025-12-04T12:44:29.0938713Z E1204 12:41:45.531000 345036 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0938900Z E1204 12:41:45.531000 345036 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0939320Z E1204 12:41:45.531000 345036 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T12:44:29.0939428Z E1204 12:41:45.531000 345036 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0939671Z E1204 12:41:45.531000 345036 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0939828Z E1204 12:41:45.531000 345036 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T12:44:29.0939957Z E1204 12:41:45.978000 345033 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.0940107Z E1204 12:41:45.978000 345033 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.0940388Z E1204 12:41:45.978000 345033 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0940563Z E1204 12:41:45.978000 345033 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.0940839Z E1204 12:41:45.978000 345033 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0940953Z E1204 12:41:45.978000 345033 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.0941220Z E1204 12:41:45.978000 345033 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0945258Z E1204 12:41:45.978000 345033 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0945560Z E1204 12:41:45.978000 345033 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0945700Z E1204 12:41:45.978000 345033 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0945969Z E1204 12:41:45.978000 345033 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0946135Z E1204 12:41:45.978000 345033 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.0946419Z E1204 12:41:45.978000 345033 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0946590Z E1204 12:41:45.978000 345033 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.0947110Z E1204 12:41:45.978000 345033 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3443523584.
2025-12-04T12:44:29.0947220Z E1204 12:41:45.978000 345033 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0947411Z E1204 12:41:45.978000 345033 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0947834Z E1204 12:41:45.978000 345033 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T12:44:29.0947942Z E1204 12:41:45.978000 345033 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0948145Z E1204 12:41:45.978000 345033 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0948300Z E1204 12:41:45.978000 345033 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T12:44:29.0948345Z FAILED [10.3195s] [100%]
2025-12-04T12:44:29.0948348Z 
2025-12-04T12:44:29.0948404Z =================================== FAILURES ===================================
2025-12-04T12:44:29.0948553Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda _
2025-12-04T12:44:29.0948605Z Traceback (most recent call last):
2025-12-04T12:44:29.0948781Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T12:44:29.0948826Z     self._join_processes(fn)
2025-12-04T12:44:29.0949001Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T12:44:29.0949056Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T12:44:29.0949236Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T12:44:29.0949282Z     raise RuntimeError(error)
2025-12-04T12:44:29.0949362Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T12:44:29.0949406Z Traceback (most recent call last):
2025-12-04T12:44:29.0949694Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0949742Z     getattr(self, test_name)()
2025-12-04T12:44:29.0949907Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0949942Z     fn()
2025-12-04T12:44:29.0950098Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0959702Z     method(*args, **kwargs)
2025-12-04T12:44:29.0959886Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0959984Z     method(*args, **kwargs)
2025-12-04T12:44:29.0960143Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0960183Z     with policy():
2025-12-04T12:44:29.0960346Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0960414Z     raise RuntimeError(msg)
2025-12-04T12:44:29.0960822Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 1268776960 and is now 3424649216.
2025-12-04T12:44:29.0960826Z 
2025-12-04T12:44:29.0960905Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0961213Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T12:44:29.0961216Z 
2025-12-04T12:44:29.0961306Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0961310Z 
2025-12-04T12:44:29.0961371Z Process 2 exited with error code 10 and exception:
2025-12-04T12:44:29.0961419Z Traceback (most recent call last):
2025-12-04T12:44:29.0961587Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0961630Z     getattr(self, test_name)()
2025-12-04T12:44:29.0961792Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0961827Z     fn()
2025-12-04T12:44:29.0961979Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0962021Z     method(*args, **kwargs)
2025-12-04T12:44:29.0962172Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0962211Z     method(*args, **kwargs)
2025-12-04T12:44:29.0962381Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0962419Z     with policy():
2025-12-04T12:44:29.0962572Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0962612Z     raise RuntimeError(msg)
2025-12-04T12:44:29.0963010Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 1268776960 and is now 3368026112.
2025-12-04T12:44:29.0963015Z 
2025-12-04T12:44:29.0963090Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0963423Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T12:44:29.0963428Z 
2025-12-04T12:44:29.0963517Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0963519Z 
2025-12-04T12:44:29.0963521Z 
2025-12-04T12:44:29.0963600Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:44:29.0963689Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T12:44:29.0963961Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-01fc5e09ab395607.xml -
2025-12-04T12:44:29.0964045Z =========================== short test summary info ============================
2025-12-04T12:44:29.0964365Z FAILED [10.3195s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T12:44:29.0964425Z Traceback (most recent call last):
2025-12-04T12:44:29.0964590Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0964632Z     getattr(self, test_name)()
2025-12-04T12:44:29.0964794Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0964829Z     fn()
2025-12-04T12:44:29.0964979Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0965019Z     method(*args, **kwargs)
2025-12-04T12:44:29.0965171Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0965210Z     method(*args, **kwargs)
2025-12-04T12:44:29.0965362Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0965399Z     with policy():
2025-12-04T12:44:29.0965551Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0965590Z     raise RuntimeError(msg)
2025-12-04T12:44:29.0965992Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 1268776960 and is now 3424649216.
2025-12-04T12:44:29.0965995Z 
2025-12-04T12:44:29.0966069Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0966368Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T12:44:29.0966371Z 
2025-12-04T12:44:29.0966467Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0966470Z 
2025-12-04T12:44:29.0966529Z Process 2 exited with error code 10 and exception:
2025-12-04T12:44:29.0966574Z Traceback (most recent call last):
2025-12-04T12:44:29.0966738Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0966780Z     getattr(self, test_name)()
2025-12-04T12:44:29.0966939Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0966974Z     fn()
2025-12-04T12:44:29.0967124Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0967163Z     method(*args, **kwargs)
2025-12-04T12:44:29.0967327Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0967368Z     method(*args, **kwargs)
2025-12-04T12:44:29.0967516Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0967553Z     with policy():
2025-12-04T12:44:29.0967702Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0967744Z     raise RuntimeError(msg)
2025-12-04T12:44:29.0968151Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 1268776960 and is now 3368026112.
2025-12-04T12:44:29.0968154Z 
2025-12-04T12:44:29.0968248Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0968547Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T12:44:29.0968549Z 
2025-12-04T12:44:29.0968635Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0968699Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:44:29.0968762Z ======================= 1 failed, 7 deselected in 10.33s =======================
2025-12-04T12:44:29.0968801Z Got exit code 1
2025-12-04T12:44:29.0969051Z FAILED CONSISTENTLY: test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda
2025-12-04T12:44:29.0969180Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:44:29.0969407Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-66bc2c5f652db8bb.xml
2025-12-04T12:44:29.0969467Z ============================= test session starts ==============================
2025-12-04T12:44:29.0969615Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:44:29.0969657Z cachedir: .pytest_cache
2025-12-04T12:44:29.0969814Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:44:29.0969862Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T12:44:29.0969903Z configfile: pytest.ini
2025-12-04T12:44:29.0970069Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T12:44:29.0970144Z collecting ... collected 8 items / 4 deselected / 4 selected
2025-12-04T12:44:29.0970198Z stepcurrent: skipping 4 already run items.
2025-12-04T12:44:29.0970259Z Running 4 items in this shard
2025-12-04T12:44:29.0970261Z 
2025-12-04T12:44:29.0970646Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda I1204 12:41:49.797000 345434 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 345503
2025-12-04T12:44:29.0970800Z I1204 12:41:49.798000 345434 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 345504
2025-12-04T12:44:29.0970952Z I1204 12:41:49.798000 345434 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 345505
2025-12-04T12:44:29.0971103Z I1204 12:41:49.799000 345434 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 345506
2025-12-04T12:44:29.0971808Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.0971852Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.0972517Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.0972588Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.0973257Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.0973299Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.0973962Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.0974005Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.0974500Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.0974552Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.0975058Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.0975106Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.0975594Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.0975641Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.0976139Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.0976188Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.0976857Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.0976912Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.0977578Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.0977630Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.0978118Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.0978182Z   distributed_c10d._get_pg_default_device(pg).type
2025-12-04T12:44:29.0978669Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.0978727Z   distributed_c10d._get_pg_default_device(pg).type
2025-12-04T12:44:29.0978963Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.0979007Z   local_shape = tensor.shape
2025-12-04T12:44:29.0979243Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.0979280Z   tensor.shape,
2025-12-04T12:44:29.0979516Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.0979554Z   tensor.dtype,
2025-12-04T12:44:29.0979836Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.0979880Z   local_shape = tensor.shape
2025-12-04T12:44:29.0980112Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.0980149Z   tensor.shape,
2025-12-04T12:44:29.0980380Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.0980416Z   tensor.dtype,
2025-12-04T12:44:29.0981104Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.0981149Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.0981814Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.0981873Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.0982374Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.0982430Z   distributed_c10d._get_pg_default_device(pg).type
2025-12-04T12:44:29.0982911Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.0982968Z   distributed_c10d._get_pg_default_device(pg).type
2025-12-04T12:44:29.0983205Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.0983247Z   local_shape = tensor.shape
2025-12-04T12:44:29.0983478Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.0983520Z   local_shape = tensor.shape
2025-12-04T12:44:29.0983751Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.0983788Z   tensor.shape,
2025-12-04T12:44:29.0984018Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.0984055Z   tensor.dtype,
2025-12-04T12:44:29.0984295Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.0984333Z   tensor.shape,
2025-12-04T12:44:29.0984564Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.0984600Z   tensor.dtype,
2025-12-04T12:44:29.0984737Z E1204 12:41:59.717000 345504 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.0984896Z E1204 12:41:59.717000 345504 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.0985193Z E1204 12:41:59.717000 345504 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0985342Z E1204 12:41:59.717000 345504 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.0985624Z E1204 12:41:59.717000 345504 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0985739Z E1204 12:41:59.717000 345504 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.0986010Z E1204 12:41:59.717000 345504 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0986163Z E1204 12:41:59.717000 345504 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0986433Z E1204 12:41:59.717000 345504 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0986582Z E1204 12:41:59.717000 345504 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0986852Z E1204 12:41:59.717000 345504 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0986982Z E1204 12:41:59.717000 345504 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.0987251Z E1204 12:41:59.717000 345504 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0987393Z E1204 12:41:59.717000 345504 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.0987923Z E1204 12:41:59.717000 345504 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 1. CUDA driver allocated memory was 1268776960 and is now 3315597312.
2025-12-04T12:44:29.0988034Z E1204 12:41:59.717000 345504 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0988224Z E1204 12:41:59.717000 345504 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0988669Z E1204 12:41:59.717000 345504 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda
2025-12-04T12:44:29.0988778Z E1204 12:41:59.717000 345504 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0988980Z E1204 12:41:59.717000 345504 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0989139Z E1204 12:41:59.717000 345504 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T12:44:29.0989268Z E1204 12:41:59.721000 345506 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.0989420Z E1204 12:41:59.721000 345506 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.0989751Z E1204 12:41:59.721000 345506 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0989897Z E1204 12:41:59.721000 345506 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.0990174Z E1204 12:41:59.721000 345506 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0990304Z E1204 12:41:59.721000 345506 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.0990572Z E1204 12:41:59.721000 345506 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0990725Z E1204 12:41:59.721000 345506 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0990995Z E1204 12:41:59.721000 345506 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0991133Z E1204 12:41:59.721000 345506 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0991401Z E1204 12:41:59.721000 345506 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0991530Z E1204 12:41:59.721000 345506 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.0991800Z E1204 12:41:59.721000 345506 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0991941Z E1204 12:41:59.721000 345506 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.0992466Z E1204 12:41:59.721000 345506 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 3. CUDA driver allocated memory was 1268776960 and is now 3315597312.
2025-12-04T12:44:29.0992575Z E1204 12:41:59.721000 345506 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0992768Z E1204 12:41:59.721000 345506 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0993213Z E1204 12:41:59.721000 345506 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda
2025-12-04T12:44:29.0993320Z E1204 12:41:59.721000 345506 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0993523Z E1204 12:41:59.721000 345506 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0993682Z E1204 12:41:59.721000 345506 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T12:44:29.0993820Z E1204 12:41:59.739000 345503 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.0993974Z E1204 12:41:59.739000 345503 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.0994252Z E1204 12:41:59.739000 345503 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0994399Z E1204 12:41:59.739000 345503 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.0994676Z E1204 12:41:59.739000 345503 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0994800Z E1204 12:41:59.739000 345503 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.0995073Z E1204 12:41:59.739000 345503 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0995222Z E1204 12:41:59.739000 345503 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0995490Z E1204 12:41:59.739000 345503 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0995628Z E1204 12:41:59.739000 345503 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.0995897Z E1204 12:41:59.739000 345503 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.0996025Z E1204 12:41:59.739000 345503 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.0996295Z E1204 12:41:59.739000 345503 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.0996434Z E1204 12:41:59.739000 345503 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.0996954Z E1204 12:41:59.739000 345503 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 0. CUDA driver allocated memory was 1438646272 and is now 3707764736.
2025-12-04T12:44:29.0997063Z E1204 12:41:59.739000 345503 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0997265Z E1204 12:41:59.739000 345503 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.0997698Z E1204 12:41:59.739000 345503 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda
2025-12-04T12:44:29.0997807Z E1204 12:41:59.739000 345503 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.0998007Z E1204 12:41:59.739000 345503 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.0998173Z E1204 12:41:59.739000 345503 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T12:44:29.0998303Z E1204 12:41:59.748000 345505 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.0998455Z E1204 12:41:59.748000 345505 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.0998731Z E1204 12:41:59.748000 345505 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.0998888Z E1204 12:41:59.748000 345505 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.0999163Z E1204 12:41:59.748000 345505 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.0999289Z E1204 12:41:59.748000 345505 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.0999558Z E1204 12:41:59.748000 345505 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.0999741Z E1204 12:41:59.748000 345505 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1000011Z E1204 12:41:59.748000 345505 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1000150Z E1204 12:41:59.748000 345505 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1000421Z E1204 12:41:59.748000 345505 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1000548Z E1204 12:41:59.748000 345505 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.1000818Z E1204 12:41:59.748000 345505 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1000956Z E1204 12:41:59.748000 345505 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.1001481Z E1204 12:41:59.748000 345505 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 2. CUDA driver allocated memory was 958398464 and is now 3315597312.
2025-12-04T12:44:29.1001601Z E1204 12:41:59.748000 345505 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1001788Z E1204 12:41:59.748000 345505 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1002217Z E1204 12:41:59.748000 345505 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda
2025-12-04T12:44:29.1002324Z E1204 12:41:59.748000 345505 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1002540Z E1204 12:41:59.748000 345505 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1002698Z E1204 12:41:59.748000 345505 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T12:44:29.1002739Z FAILED [11.4211s] [ 25%]
2025-12-04T12:44:29.1002741Z 
2025-12-04T12:44:29.1002799Z =================================== FAILURES ===================================
2025-12-04T12:44:29.1002956Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda _
2025-12-04T12:44:29.1003003Z Traceback (most recent call last):
2025-12-04T12:44:29.1003179Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T12:44:29.1003223Z     self._join_processes(fn)
2025-12-04T12:44:29.1003396Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T12:44:29.1003463Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T12:44:29.1003643Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T12:44:29.1003687Z     raise RuntimeError(error)
2025-12-04T12:44:29.1003766Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T12:44:29.1003812Z Traceback (most recent call last):
2025-12-04T12:44:29.1003972Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1004018Z     getattr(self, test_name)()
2025-12-04T12:44:29.1004176Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1004210Z     fn()
2025-12-04T12:44:29.1004363Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1004406Z     method(*args, **kwargs)
2025-12-04T12:44:29.1004558Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1004599Z     method(*args, **kwargs)
2025-12-04T12:44:29.1004749Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1004787Z     with policy():
2025-12-04T12:44:29.1004940Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1004981Z     raise RuntimeError(msg)
2025-12-04T12:44:29.1005399Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 1. CUDA driver allocated memory was 1268776960 and is now 3315597312.
2025-12-04T12:44:29.1005402Z 
2025-12-04T12:44:29.1005485Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1005800Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda
2025-12-04T12:44:29.1005802Z 
2025-12-04T12:44:29.1005889Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1005891Z 
2025-12-04T12:44:29.1005950Z Process 3 exited with error code 10 and exception:
2025-12-04T12:44:29.1005996Z Traceback (most recent call last):
2025-12-04T12:44:29.1006160Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1006200Z     getattr(self, test_name)()
2025-12-04T12:44:29.1006369Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1006405Z     fn()
2025-12-04T12:44:29.1006556Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1006596Z     method(*args, **kwargs)
2025-12-04T12:44:29.1006746Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1006964Z     method(*args, **kwargs)
2025-12-04T12:44:29.1007116Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1007164Z     with policy():
2025-12-04T12:44:29.1007315Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1007355Z     raise RuntimeError(msg)
2025-12-04T12:44:29.1007765Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 3. CUDA driver allocated memory was 1268776960 and is now 3315597312.
2025-12-04T12:44:29.1007778Z 
2025-12-04T12:44:29.1007853Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1008161Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda
2025-12-04T12:44:29.1008165Z 
2025-12-04T12:44:29.1008250Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1008252Z 
2025-12-04T12:44:29.1008254Z 
2025-12-04T12:44:29.1008330Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:44:29.1008417Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T12:44:29.1008691Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-66bc2c5f652db8bb.xml -
2025-12-04T12:44:29.1008751Z =========================== short test summary info ============================
2025-12-04T12:44:29.1009075Z FAILED [11.4211s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T12:44:29.1009121Z Traceback (most recent call last):
2025-12-04T12:44:29.1009286Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1009328Z     getattr(self, test_name)()
2025-12-04T12:44:29.1009489Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1009523Z     fn()
2025-12-04T12:44:29.1009719Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1009760Z     method(*args, **kwargs)
2025-12-04T12:44:29.1009909Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1009949Z     method(*args, **kwargs)
2025-12-04T12:44:29.1010096Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1010134Z     with policy():
2025-12-04T12:44:29.1010285Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1010326Z     raise RuntimeError(msg)
2025-12-04T12:44:29.1010746Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 1. CUDA driver allocated memory was 1268776960 and is now 3315597312.
2025-12-04T12:44:29.1010750Z 
2025-12-04T12:44:29.1010824Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1011131Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda
2025-12-04T12:44:29.1011146Z 
2025-12-04T12:44:29.1011232Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1011234Z 
2025-12-04T12:44:29.1011292Z Process 3 exited with error code 10 and exception:
2025-12-04T12:44:29.1011336Z Traceback (most recent call last):
2025-12-04T12:44:29.1011500Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1011565Z     getattr(self, test_name)()
2025-12-04T12:44:29.1011725Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1011759Z     fn()
2025-12-04T12:44:29.1011910Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1011949Z     method(*args, **kwargs)
2025-12-04T12:44:29.1012098Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1012137Z     method(*args, **kwargs)
2025-12-04T12:44:29.1012289Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1012325Z     with policy():
2025-12-04T12:44:29.1012479Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1012518Z     raise RuntimeError(msg)
2025-12-04T12:44:29.1012922Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 3. CUDA driver allocated memory was 1268776960 and is now 3315597312.
2025-12-04T12:44:29.1012925Z 
2025-12-04T12:44:29.1012996Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1013306Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda
2025-12-04T12:44:29.1013308Z 
2025-12-04T12:44:29.1013396Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1013471Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:44:29.1013536Z ======================= 1 failed, 4 deselected in 11.43s =======================
2025-12-04T12:44:29.1013572Z Got exit code 1
2025-12-04T12:44:29.1013612Z Retrying single test...
2025-12-04T12:44:29.1013839Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-641b00e6e5af3cdf.xml
2025-12-04T12:44:29.1013897Z ============================= test session starts ==============================
2025-12-04T12:44:29.1014012Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:44:29.1014053Z cachedir: .pytest_cache
2025-12-04T12:44:29.1014216Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:44:29.1014275Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T12:44:29.1014315Z configfile: pytest.ini
2025-12-04T12:44:29.1014485Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T12:44:29.1014557Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T12:44:29.1014861Z stepcurrent: skipping 4 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda
2025-12-04T12:44:29.1014913Z Running 1 items in this shard
2025-12-04T12:44:29.1014915Z 
2025-12-04T12:44:29.1015299Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda I1204 12:42:03.803000 345972 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 346041
2025-12-04T12:44:29.1015466Z I1204 12:42:03.804000 345972 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 346042
2025-12-04T12:44:29.1015617Z I1204 12:42:03.804000 345972 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 346043
2025-12-04T12:44:29.1015770Z I1204 12:42:03.805000 345972 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 346044
2025-12-04T12:44:29.1016451Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1016498Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.1017167Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1017210Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.1017876Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1017930Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.1018591Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1018633Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.1019142Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.1019194Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.1019723Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.1019787Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.1020273Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.1020330Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.1020818Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.1020866Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.1021538Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1021580Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.1022248Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1022290Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.1022965Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1023008Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.1023670Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1023711Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.1024219Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.1024277Z   distributed_c10d._get_pg_default_device(pg).type
2025-12-04T12:44:29.1024758Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.1024828Z   distributed_c10d._get_pg_default_device(pg).type
2025-12-04T12:44:29.1025309Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.1025375Z   distributed_c10d._get_pg_default_device(pg).type
2025-12-04T12:44:29.1025853Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.1025909Z   distributed_c10d._get_pg_default_device(pg).type
2025-12-04T12:44:29.1026148Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1026193Z   local_shape = tensor.shape
2025-12-04T12:44:29.1026428Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1026464Z   tensor.shape,
2025-12-04T12:44:29.1026697Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1026735Z   tensor.dtype,
2025-12-04T12:44:29.1026968Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1027010Z   local_shape = tensor.shape
2025-12-04T12:44:29.1027251Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1027288Z   tensor.shape,
2025-12-04T12:44:29.1027519Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1027555Z   tensor.dtype,
2025-12-04T12:44:29.1027784Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1027826Z   local_shape = tensor.shape
2025-12-04T12:44:29.1028057Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1028092Z   tensor.shape,
2025-12-04T12:44:29.1028342Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1028379Z   tensor.dtype,
2025-12-04T12:44:29.1028608Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1028649Z   local_shape = tensor.shape
2025-12-04T12:44:29.1028878Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1028936Z   tensor.shape,
2025-12-04T12:44:29.1029167Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1029203Z   tensor.dtype,
2025-12-04T12:44:29.1029349Z E1204 12:42:13.571000 346041 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.1029506Z E1204 12:42:13.571000 346041 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.1029833Z E1204 12:42:13.571000 346041 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1029982Z E1204 12:42:13.571000 346041 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.1030262Z E1204 12:42:13.571000 346041 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1030379Z E1204 12:42:13.571000 346041 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.1030650Z E1204 12:42:13.571000 346041 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1030790Z E1204 12:42:13.571000 346041 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1031062Z E1204 12:42:13.571000 346041 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1031201Z E1204 12:42:13.571000 346041 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1031471Z E1204 12:42:13.571000 346041 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1031614Z E1204 12:42:13.571000 346041 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.1031887Z E1204 12:42:13.571000 346041 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1032028Z E1204 12:42:13.571000 346041 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.1032567Z E1204 12:42:13.571000 346041 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 0. CUDA driver allocated memory was 1438646272 and is now 3946840064.
2025-12-04T12:44:29.1032679Z E1204 12:42:13.571000 346041 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1032869Z E1204 12:42:13.571000 346041 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1033300Z E1204 12:42:13.571000 346041 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda
2025-12-04T12:44:29.1033423Z E1204 12:42:13.571000 346041 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1033627Z E1204 12:42:13.571000 346041 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1033798Z E1204 12:42:13.571000 346041 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T12:44:29.1033927Z E1204 12:42:13.573000 346042 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.1034079Z E1204 12:42:13.573000 346042 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.1034360Z E1204 12:42:13.573000 346042 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1034507Z E1204 12:42:13.573000 346042 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.1034784Z E1204 12:42:13.573000 346042 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1034901Z E1204 12:42:13.573000 346042 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.1035174Z E1204 12:42:13.573000 346042 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1035313Z E1204 12:42:13.573000 346042 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1035583Z E1204 12:42:13.573000 346042 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1035722Z E1204 12:42:13.573000 346042 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1036006Z E1204 12:42:13.573000 346042 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1036133Z E1204 12:42:13.573000 346042 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.1036402Z E1204 12:42:13.573000 346042 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1036542Z E1204 12:42:13.573000 346042 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.1037078Z E1204 12:42:13.573000 346042 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 1. CUDA driver allocated memory was 1268776960 and is now 3584032768.
2025-12-04T12:44:29.1037187Z E1204 12:42:13.573000 346042 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1037376Z E1204 12:42:13.573000 346042 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1037807Z E1204 12:42:13.573000 346042 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda
2025-12-04T12:44:29.1037924Z E1204 12:42:13.573000 346042 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1038139Z E1204 12:42:13.573000 346042 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1038296Z E1204 12:42:13.573000 346042 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T12:44:29.1038424Z E1204 12:42:14.000000 346044 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.1038578Z E1204 12:42:14.000000 346044 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.1038857Z E1204 12:42:14.000000 346044 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1039003Z E1204 12:42:14.000000 346044 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.1039283Z E1204 12:42:14.000000 346044 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1039398Z E1204 12:42:14.000000 346044 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.1039709Z E1204 12:42:14.000000 346044 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1039850Z E1204 12:42:14.000000 346044 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1040119Z E1204 12:42:14.000000 346044 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1040271Z E1204 12:42:14.000000 346044 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1040539Z E1204 12:42:14.000000 346044 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1040664Z E1204 12:42:14.000000 346044 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.1040934Z E1204 12:42:14.000000 346044 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1041073Z E1204 12:42:14.000000 346044 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.1041608Z E1204 12:42:14.000000 346044 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 3. CUDA driver allocated memory was 1268776960 and is now 3315597312.
2025-12-04T12:44:29.1041715Z E1204 12:42:14.000000 346044 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1041902Z E1204 12:42:14.000000 346044 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1042347Z E1204 12:42:14.000000 346044 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda
2025-12-04T12:44:29.1042465Z E1204 12:42:14.000000 346044 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1042670Z E1204 12:42:14.000000 346044 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1042826Z E1204 12:42:14.000000 346044 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T12:44:29.1042955Z E1204 12:42:14.065000 346043 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.1043108Z E1204 12:42:14.065000 346043 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.1043387Z E1204 12:42:14.065000 346043 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1043537Z E1204 12:42:14.065000 346043 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.1043813Z E1204 12:42:14.065000 346043 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1043927Z E1204 12:42:14.065000 346043 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.1044195Z E1204 12:42:14.065000 346043 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1044335Z E1204 12:42:14.065000 346043 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1044614Z E1204 12:42:14.065000 346043 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1044755Z E1204 12:42:14.065000 346043 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1045022Z E1204 12:42:14.065000 346043 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1045149Z E1204 12:42:14.065000 346043 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.1045435Z E1204 12:42:14.065000 346043 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1045576Z E1204 12:42:14.065000 346043 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.1046100Z E1204 12:42:14.065000 346043 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 2. CUDA driver allocated memory was 1262485504 and is now 3315597312.
2025-12-04T12:44:29.1046219Z E1204 12:42:14.065000 346043 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1046405Z E1204 12:42:14.065000 346043 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1046833Z E1204 12:42:14.065000 346043 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda
2025-12-04T12:44:29.1046949Z E1204 12:42:14.065000 346043 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1047153Z E1204 12:42:14.065000 346043 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1047308Z E1204 12:42:14.065000 346043 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T12:44:29.1047350Z FAILED [11.1205s] [100%]
2025-12-04T12:44:29.1047352Z 
2025-12-04T12:44:29.1047408Z =================================== FAILURES ===================================
2025-12-04T12:44:29.1047565Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda _
2025-12-04T12:44:29.1047613Z Traceback (most recent call last):
2025-12-04T12:44:29.1047778Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T12:44:29.1047821Z     self._join_processes(fn)
2025-12-04T12:44:29.1047993Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T12:44:29.1048049Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T12:44:29.1048230Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T12:44:29.1048274Z     raise RuntimeError(error)
2025-12-04T12:44:29.1048353Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:44:29.1048398Z Traceback (most recent call last):
2025-12-04T12:44:29.1048559Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1048613Z     getattr(self, test_name)()
2025-12-04T12:44:29.1048774Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1048811Z     fn()
2025-12-04T12:44:29.1048963Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1049004Z     method(*args, **kwargs)
2025-12-04T12:44:29.1049156Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1049197Z     method(*args, **kwargs)
2025-12-04T12:44:29.1049350Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1049390Z     with policy():
2025-12-04T12:44:29.1049552Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1049636Z     raise RuntimeError(msg)
2025-12-04T12:44:29.1050042Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 0. CUDA driver allocated memory was 1438646272 and is now 3946840064.
2025-12-04T12:44:29.1050046Z 
2025-12-04T12:44:29.1050121Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1050448Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda
2025-12-04T12:44:29.1050451Z 
2025-12-04T12:44:29.1050539Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1050553Z 
2025-12-04T12:44:29.1050554Z 
2025-12-04T12:44:29.1050631Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:44:29.1050717Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T12:44:29.1050990Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-641b00e6e5af3cdf.xml -
2025-12-04T12:44:29.1051050Z =========================== short test summary info ============================
2025-12-04T12:44:29.1051374Z FAILED [11.1205s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:44:29.1051422Z Traceback (most recent call last):
2025-12-04T12:44:29.1051588Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1051634Z     getattr(self, test_name)()
2025-12-04T12:44:29.1051793Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1051831Z     fn()
2025-12-04T12:44:29.1051983Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1052025Z     method(*args, **kwargs)
2025-12-04T12:44:29.1052175Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1052218Z     method(*args, **kwargs)
2025-12-04T12:44:29.1052370Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1052409Z     with policy():
2025-12-04T12:44:29.1052562Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1052617Z     raise RuntimeError(msg)
2025-12-04T12:44:29.1053018Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 0. CUDA driver allocated memory was 1438646272 and is now 3946840064.
2025-12-04T12:44:29.1053020Z 
2025-12-04T12:44:29.1053096Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1053407Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda
2025-12-04T12:44:29.1053409Z 
2025-12-04T12:44:29.1053507Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1053572Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:44:29.1053634Z ======================= 1 failed, 7 deselected in 11.13s =======================
2025-12-04T12:44:29.1053671Z Got exit code 1
2025-12-04T12:44:29.1053710Z Retrying single test...
2025-12-04T12:44:29.1053937Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-caddf5c446c0670f.xml
2025-12-04T12:44:29.1053994Z ============================= test session starts ==============================
2025-12-04T12:44:29.1054119Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:44:29.1054160Z cachedir: .pytest_cache
2025-12-04T12:44:29.1054319Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:44:29.1054375Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T12:44:29.1054415Z configfile: pytest.ini
2025-12-04T12:44:29.1054579Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T12:44:29.1054650Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T12:44:29.1054949Z stepcurrent: skipping 4 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda
2025-12-04T12:44:29.1054993Z Running 1 items in this shard
2025-12-04T12:44:29.1054996Z 
2025-12-04T12:44:29.1055378Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda I1204 12:42:17.697000 346510 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 346579
2025-12-04T12:44:29.1055535Z I1204 12:42:17.697000 346510 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 346580
2025-12-04T12:44:29.1055690Z I1204 12:42:17.698000 346510 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 346581
2025-12-04T12:44:29.1055840Z I1204 12:42:17.699000 346510 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 346582
2025-12-04T12:44:29.1056531Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1056575Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.1057250Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1057294Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.1057966Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1058012Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.1058673Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1058726Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.1059220Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.1059279Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.1059797Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.1059845Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.1060330Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.1060377Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.1060859Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.1060906Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.1061596Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1061639Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.1062297Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1062338Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.1062833Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.1062893Z   distributed_c10d._get_pg_default_device(pg).type
2025-12-04T12:44:29.1063371Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.1063442Z   distributed_c10d._get_pg_default_device(pg).type
2025-12-04T12:44:29.1063677Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1063735Z   local_shape = tensor.shape
2025-12-04T12:44:29.1063970Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1064009Z   tensor.shape,
2025-12-04T12:44:29.1064241Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1064280Z   tensor.dtype,
2025-12-04T12:44:29.1064509Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1064553Z   local_shape = tensor.shape
2025-12-04T12:44:29.1064785Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1064824Z   tensor.shape,
2025-12-04T12:44:29.1065056Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1065091Z   tensor.dtype,
2025-12-04T12:44:29.1065760Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1065803Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.1066475Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1066517Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.1067000Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.1067058Z   distributed_c10d._get_pg_default_device(pg).type
2025-12-04T12:44:29.1067548Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.1067605Z   distributed_c10d._get_pg_default_device(pg).type
2025-12-04T12:44:29.1067838Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1067890Z   local_shape = tensor.shape
2025-12-04T12:44:29.1068120Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1068157Z   tensor.shape,
2025-12-04T12:44:29.1068389Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1068434Z   tensor.dtype,
2025-12-04T12:44:29.1068666Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1068708Z   local_shape = tensor.shape
2025-12-04T12:44:29.1068940Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1068978Z   tensor.shape,
2025-12-04T12:44:29.1069209Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1069244Z   tensor.dtype,
2025-12-04T12:44:29.1069381Z E1204 12:42:27.496000 346580 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.1069536Z E1204 12:42:27.496000 346580 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.1069873Z E1204 12:42:27.496000 346580 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1070020Z E1204 12:42:27.496000 346580 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.1070299Z E1204 12:42:27.496000 346580 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1070417Z E1204 12:42:27.496000 346580 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.1070701Z E1204 12:42:27.496000 346580 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1070841Z E1204 12:42:27.496000 346580 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1071107Z E1204 12:42:27.496000 346580 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1071248Z E1204 12:42:27.496000 346580 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1071527Z E1204 12:42:27.496000 346580 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1071660Z E1204 12:42:27.496000 346580 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.1071929Z E1204 12:42:27.496000 346580 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1072068Z E1204 12:42:27.496000 346580 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.1072610Z E1204 12:42:27.496000 346580 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 1. CUDA driver allocated memory was 1268776960 and is now 3315597312.
2025-12-04T12:44:29.1072732Z E1204 12:42:27.496000 346580 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1072921Z E1204 12:42:27.496000 346580 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1073348Z E1204 12:42:27.496000 346580 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda
2025-12-04T12:44:29.1073459Z E1204 12:42:27.496000 346580 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1073666Z E1204 12:42:27.496000 346580 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1073825Z E1204 12:42:27.496000 346580 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T12:44:29.1073955Z E1204 12:42:27.498000 346581 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.1074105Z E1204 12:42:27.498000 346581 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.1074386Z E1204 12:42:27.498000 346581 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1074532Z E1204 12:42:27.498000 346581 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.1074809Z E1204 12:42:27.498000 346581 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1074933Z E1204 12:42:27.498000 346581 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.1075205Z E1204 12:42:27.498000 346581 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1075347Z E1204 12:42:27.498000 346581 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1075616Z E1204 12:42:27.498000 346581 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1075762Z E1204 12:42:27.498000 346581 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1076043Z E1204 12:42:27.498000 346581 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1076170Z E1204 12:42:27.498000 346581 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.1076440Z E1204 12:42:27.498000 346581 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1076591Z E1204 12:42:27.498000 346581 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.1077118Z E1204 12:42:27.498000 346581 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 2. CUDA driver allocated memory was 1268776960 and is now 3315597312.
2025-12-04T12:44:29.1077241Z E1204 12:42:27.498000 346581 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1077429Z E1204 12:42:27.498000 346581 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1077855Z E1204 12:42:27.498000 346581 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda
2025-12-04T12:44:29.1077962Z E1204 12:42:27.498000 346581 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1078168Z E1204 12:42:27.498000 346581 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1078327Z E1204 12:42:27.498000 346581 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T12:44:29.1078457Z E1204 12:42:27.553000 346582 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.1078608Z E1204 12:42:27.553000 346582 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.1078888Z E1204 12:42:27.553000 346582 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1079034Z E1204 12:42:27.553000 346582 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.1079325Z E1204 12:42:27.553000 346582 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1079438Z E1204 12:42:27.553000 346582 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.1079746Z E1204 12:42:27.553000 346582 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1079885Z E1204 12:42:27.553000 346582 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1080165Z E1204 12:42:27.553000 346582 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1080309Z E1204 12:42:27.553000 346582 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1080574Z E1204 12:42:27.553000 346582 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1080703Z E1204 12:42:27.553000 346582 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.1080974Z E1204 12:42:27.553000 346582 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1081130Z E1204 12:42:27.553000 346582 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.1081655Z E1204 12:42:27.553000 346582 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 3. CUDA driver allocated memory was 952107008 and is now 3315597312.
2025-12-04T12:44:29.1081773Z E1204 12:42:27.553000 346582 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1081962Z E1204 12:42:27.553000 346582 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1082397Z E1204 12:42:27.553000 346582 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda
2025-12-04T12:44:29.1082509Z E1204 12:42:27.553000 346582 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1082709Z E1204 12:42:27.553000 346582 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1082867Z E1204 12:42:27.553000 346582 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T12:44:29.1082997Z E1204 12:42:27.564000 346579 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.1083150Z E1204 12:42:27.564000 346579 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.1083432Z E1204 12:42:27.564000 346579 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1083593Z E1204 12:42:27.564000 346579 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.1083868Z E1204 12:42:27.564000 346579 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1083982Z E1204 12:42:27.564000 346579 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.1084252Z E1204 12:42:27.564000 346579 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1084391Z E1204 12:42:27.564000 346579 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1084675Z E1204 12:42:27.564000 346579 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1084815Z E1204 12:42:27.564000 346579 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1085082Z E1204 12:42:27.564000 346579 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1085221Z E1204 12:42:27.564000 346579 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.1085491Z E1204 12:42:27.564000 346579 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1085643Z E1204 12:42:27.564000 346579 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.1086163Z E1204 12:42:27.564000 346579 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 0. CUDA driver allocated memory was 1438646272 and is now 3705667584.
2025-12-04T12:44:29.1086272Z E1204 12:42:27.564000 346579 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1086462Z E1204 12:42:27.564000 346579 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1086891Z E1204 12:42:27.564000 346579 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda
2025-12-04T12:44:29.1087001Z E1204 12:42:27.564000 346579 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1087200Z E1204 12:42:27.564000 346579 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1087358Z E1204 12:42:27.564000 346579 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T12:44:29.1087399Z FAILED [11.2209s] [100%]
2025-12-04T12:44:29.1087401Z 
2025-12-04T12:44:29.1087458Z =================================== FAILURES ===================================
2025-12-04T12:44:29.1087613Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda _
2025-12-04T12:44:29.1087662Z Traceback (most recent call last):
2025-12-04T12:44:29.1087834Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T12:44:29.1087877Z     self._join_processes(fn)
2025-12-04T12:44:29.1088049Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T12:44:29.1088102Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T12:44:29.1088282Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T12:44:29.1088326Z     raise RuntimeError(error)
2025-12-04T12:44:29.1088406Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T12:44:29.1088450Z Traceback (most recent call last):
2025-12-04T12:44:29.1088623Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1088667Z     getattr(self, test_name)()
2025-12-04T12:44:29.1088826Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1088860Z     fn()
2025-12-04T12:44:29.1089011Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1089051Z     method(*args, **kwargs)
2025-12-04T12:44:29.1089202Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1089253Z     method(*args, **kwargs)
2025-12-04T12:44:29.1089403Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1089439Z     with policy():
2025-12-04T12:44:29.1089622Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1089679Z     raise RuntimeError(msg)
2025-12-04T12:44:29.1090088Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 1. CUDA driver allocated memory was 1268776960 and is now 3315597312.
2025-12-04T12:44:29.1090090Z 
2025-12-04T12:44:29.1090164Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1090473Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda
2025-12-04T12:44:29.1090476Z 
2025-12-04T12:44:29.1090563Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1090567Z 
2025-12-04T12:44:29.1090626Z Process 2 exited with error code 10 and exception:
2025-12-04T12:44:29.1090673Z Traceback (most recent call last):
2025-12-04T12:44:29.1090836Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1090878Z     getattr(self, test_name)()
2025-12-04T12:44:29.1091042Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1091078Z     fn()
2025-12-04T12:44:29.1091230Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1091270Z     method(*args, **kwargs)
2025-12-04T12:44:29.1091419Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1091459Z     method(*args, **kwargs)
2025-12-04T12:44:29.1091628Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1091667Z     with policy():
2025-12-04T12:44:29.1091817Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1091858Z     raise RuntimeError(msg)
2025-12-04T12:44:29.1092265Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 2. CUDA driver allocated memory was 1268776960 and is now 3315597312.
2025-12-04T12:44:29.1092268Z 
2025-12-04T12:44:29.1092341Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1092665Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda
2025-12-04T12:44:29.1092669Z 
2025-12-04T12:44:29.1092756Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1092758Z 
2025-12-04T12:44:29.1092815Z Process 3 exited with error code 10 and exception:
2025-12-04T12:44:29.1092859Z Traceback (most recent call last):
2025-12-04T12:44:29.1093023Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1093089Z     getattr(self, test_name)()
2025-12-04T12:44:29.1093249Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1093283Z     fn()
2025-12-04T12:44:29.1093436Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1093489Z     method(*args, **kwargs)
2025-12-04T12:44:29.1093640Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1093679Z     method(*args, **kwargs)
2025-12-04T12:44:29.1093828Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1093865Z     with policy():
2025-12-04T12:44:29.1094016Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1094060Z     raise RuntimeError(msg)
2025-12-04T12:44:29.1094466Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 3. CUDA driver allocated memory was 952107008 and is now 3315597312.
2025-12-04T12:44:29.1094470Z 
2025-12-04T12:44:29.1094545Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1094858Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda
2025-12-04T12:44:29.1094860Z 
2025-12-04T12:44:29.1094949Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1094951Z 
2025-12-04T12:44:29.1094953Z 
2025-12-04T12:44:29.1095030Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:44:29.1095117Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T12:44:29.1095393Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-caddf5c446c0670f.xml -
2025-12-04T12:44:29.1095454Z =========================== short test summary info ============================
2025-12-04T12:44:29.1095789Z FAILED [11.2209s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T12:44:29.1095836Z Traceback (most recent call last):
2025-12-04T12:44:29.1096001Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1096043Z     getattr(self, test_name)()
2025-12-04T12:44:29.1096204Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1096238Z     fn()
2025-12-04T12:44:29.1096388Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1096441Z     method(*args, **kwargs)
2025-12-04T12:44:29.1096596Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1096635Z     method(*args, **kwargs)
2025-12-04T12:44:29.1096789Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1096827Z     with policy():
2025-12-04T12:44:29.1096978Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1097031Z     raise RuntimeError(msg)
2025-12-04T12:44:29.1097438Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 1. CUDA driver allocated memory was 1268776960 and is now 3315597312.
2025-12-04T12:44:29.1097450Z 
2025-12-04T12:44:29.1097523Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1097829Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda
2025-12-04T12:44:29.1097831Z 
2025-12-04T12:44:29.1097918Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1097921Z 
2025-12-04T12:44:29.1097978Z Process 2 exited with error code 10 and exception:
2025-12-04T12:44:29.1098025Z Traceback (most recent call last):
2025-12-04T12:44:29.1098187Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1098228Z     getattr(self, test_name)()
2025-12-04T12:44:29.1098390Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1098426Z     fn()
2025-12-04T12:44:29.1098577Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1098615Z     method(*args, **kwargs)
2025-12-04T12:44:29.1098767Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1098806Z     method(*args, **kwargs)
2025-12-04T12:44:29.1098954Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1098990Z     with policy():
2025-12-04T12:44:29.1099143Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1099184Z     raise RuntimeError(msg)
2025-12-04T12:44:29.1099645Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 2. CUDA driver allocated memory was 1268776960 and is now 3315597312.
2025-12-04T12:44:29.1099649Z 
2025-12-04T12:44:29.1099724Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1100032Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda
2025-12-04T12:44:29.1100036Z 
2025-12-04T12:44:29.1100123Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1100125Z 
2025-12-04T12:44:29.1100182Z Process 3 exited with error code 10 and exception:
2025-12-04T12:44:29.1100228Z Traceback (most recent call last):
2025-12-04T12:44:29.1100401Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1100445Z     getattr(self, test_name)()
2025-12-04T12:44:29.1100604Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1100641Z     fn()
2025-12-04T12:44:29.1100789Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1100830Z     method(*args, **kwargs)
2025-12-04T12:44:29.1100978Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1101035Z     method(*args, **kwargs)
2025-12-04T12:44:29.1101185Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1101221Z     with policy():
2025-12-04T12:44:29.1101375Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1101429Z     raise RuntimeError(msg)
2025-12-04T12:44:29.1101832Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 3. CUDA driver allocated memory was 952107008 and is now 3315597312.
2025-12-04T12:44:29.1101835Z 
2025-12-04T12:44:29.1101908Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1102217Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda
2025-12-04T12:44:29.1102219Z 
2025-12-04T12:44:29.1102304Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1102370Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:44:29.1102432Z ======================= 1 failed, 7 deselected in 11.23s =======================
2025-12-04T12:44:29.1102471Z Got exit code 1
2025-12-04T12:44:29.1102732Z FAILED CONSISTENTLY: test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda
2025-12-04T12:44:29.1102864Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:44:29.1103094Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-d1832f138396539a.xml
2025-12-04T12:44:29.1103154Z ============================= test session starts ==============================
2025-12-04T12:44:29.1103267Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:44:29.1103310Z cachedir: .pytest_cache
2025-12-04T12:44:29.1103479Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:44:29.1103528Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T12:44:29.1103569Z configfile: pytest.ini
2025-12-04T12:44:29.1103736Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T12:44:29.1103806Z collecting ... collected 8 items / 5 deselected / 3 selected
2025-12-04T12:44:29.1103860Z stepcurrent: skipping 5 already run items.
2025-12-04T12:44:29.1103905Z Running 3 items in this shard
2025-12-04T12:44:29.1103907Z 
2025-12-04T12:44:29.1104296Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda I1204 12:42:31.257000 347048 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 347117
2025-12-04T12:44:29.1104455Z I1204 12:42:31.257000 347048 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 347118
2025-12-04T12:44:29.1104606Z I1204 12:42:31.258000 347048 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 347119
2025-12-04T12:44:29.1104756Z I1204 12:42:31.259000 347048 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 347120
2025-12-04T12:44:29.1105440Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1105507Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.1106176Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1106220Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.1106885Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1106927Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.1107591Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1107635Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.1108146Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.1108199Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.1108681Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.1108732Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.1109239Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.1109286Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.1109897Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.1109961Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.1110633Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1110689Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.1111352Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1111397Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.1111882Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.1111942Z   distributed_c10d._get_pg_default_device(pg).type
2025-12-04T12:44:29.1112424Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.1112482Z   distributed_c10d._get_pg_default_device(pg).type
2025-12-04T12:44:29.1113165Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1113207Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.1113867Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1113923Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.1114409Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.1114468Z   distributed_c10d._get_pg_default_device(pg).type
2025-12-04T12:44:29.1114948Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.1115016Z   distributed_c10d._get_pg_default_device(pg).type
2025-12-04T12:44:29.1115263Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1115306Z   local_shape = tensor.shape
2025-12-04T12:44:29.1115542Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1115579Z   tensor.shape,
2025-12-04T12:44:29.1115813Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1115850Z   tensor.dtype,
2025-12-04T12:44:29.1116082Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1116125Z   local_shape = tensor.shape
2025-12-04T12:44:29.1116358Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1116393Z   tensor.shape,
2025-12-04T12:44:29.1116623Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1116658Z   tensor.dtype,
2025-12-04T12:44:29.1116892Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1116934Z   local_shape = tensor.shape
2025-12-04T12:44:29.1117165Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1117203Z   tensor.shape,
2025-12-04T12:44:29.1117441Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1117479Z   tensor.dtype,
2025-12-04T12:44:29.1117709Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1117751Z   local_shape = tensor.shape
2025-12-04T12:44:29.1117984Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1118022Z   tensor.shape,
2025-12-04T12:44:29.1118264Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1118302Z   tensor.dtype,
2025-12-04T12:44:29.1118438Z E1204 12:42:40.524000 347117 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.1118593Z E1204 12:42:40.524000 347117 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.1118877Z E1204 12:42:40.524000 347117 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1119034Z E1204 12:42:40.524000 347117 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.1119314Z E1204 12:42:40.524000 347117 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1119440Z E1204 12:42:40.524000 347117 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.1119753Z E1204 12:42:40.524000 347117 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1119893Z E1204 12:42:40.524000 347117 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1120163Z E1204 12:42:40.524000 347117 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1120302Z E1204 12:42:40.524000 347117 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1120572Z E1204 12:42:40.524000 347117 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1120701Z E1204 12:42:40.524000 347117 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.1120973Z E1204 12:42:40.524000 347117 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1121114Z E1204 12:42:40.524000 347117 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.1121642Z E1204 12:42:40.524000 347117 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 0. CUDA driver allocated memory was 1438646272 and is now 3682598912.
2025-12-04T12:44:29.1121767Z E1204 12:42:40.524000 347117 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1121955Z E1204 12:42:40.524000 347117 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1122386Z E1204 12:42:40.524000 347117 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda
2025-12-04T12:44:29.1122494Z E1204 12:42:40.524000 347117 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1122710Z E1204 12:42:40.524000 347117 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1122871Z E1204 12:42:40.524000 347117 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T12:44:29.1123000Z E1204 12:42:40.600000 347119 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.1123154Z E1204 12:42:40.600000 347119 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.1123434Z E1204 12:42:40.600000 347119 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1123594Z E1204 12:42:40.600000 347119 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.1123872Z E1204 12:42:40.600000 347119 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1124000Z E1204 12:42:40.600000 347119 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.1124269Z E1204 12:42:40.600000 347119 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1124407Z E1204 12:42:40.600000 347119 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1124676Z E1204 12:42:40.600000 347119 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1124814Z E1204 12:42:40.600000 347119 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1125084Z E1204 12:42:40.600000 347119 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1125212Z E1204 12:42:40.600000 347119 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.1125483Z E1204 12:42:40.600000 347119 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1125623Z E1204 12:42:40.600000 347119 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.1126163Z E1204 12:42:40.600000 347119 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 2. CUDA driver allocated memory was 1262485504 and is now 3292528640.
2025-12-04T12:44:29.1126271Z E1204 12:42:40.600000 347119 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1126459Z E1204 12:42:40.600000 347119 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1126890Z E1204 12:42:40.600000 347119 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda
2025-12-04T12:44:29.1127005Z E1204 12:42:40.600000 347119 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1127209Z E1204 12:42:40.600000 347119 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1127368Z E1204 12:42:40.600000 347119 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T12:44:29.1127497Z E1204 12:42:40.616000 347120 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.1127649Z E1204 12:42:40.616000 347120 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.1127939Z E1204 12:42:40.616000 347120 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1128086Z E1204 12:42:40.616000 347120 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.1128370Z E1204 12:42:40.616000 347120 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1128485Z E1204 12:42:40.616000 347120 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.1128754Z E1204 12:42:40.616000 347120 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1128893Z E1204 12:42:40.616000 347120 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1129163Z E1204 12:42:40.616000 347120 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1129302Z E1204 12:42:40.616000 347120 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1129616Z E1204 12:42:40.616000 347120 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1129741Z E1204 12:42:40.616000 347120 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.1130011Z E1204 12:42:40.616000 347120 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1130152Z E1204 12:42:40.616000 347120 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.1130692Z E1204 12:42:40.616000 347120 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 3. CUDA driver allocated memory was 1268776960 and is now 3292528640.
2025-12-04T12:44:29.1130801Z E1204 12:42:40.616000 347120 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1130990Z E1204 12:42:40.616000 347120 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1131433Z E1204 12:42:40.616000 347120 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda
2025-12-04T12:44:29.1131542Z E1204 12:42:40.616000 347120 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1131749Z E1204 12:42:40.616000 347120 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1131908Z E1204 12:42:40.616000 347120 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T12:44:29.1132047Z E1204 12:42:40.616000 347118 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.1132199Z E1204 12:42:40.616000 347118 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.1132480Z E1204 12:42:40.616000 347118 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1132640Z E1204 12:42:40.616000 347118 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.1132916Z E1204 12:42:40.616000 347118 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1133030Z E1204 12:42:40.616000 347118 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.1133299Z E1204 12:42:40.616000 347118 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1133440Z E1204 12:42:40.616000 347118 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1133711Z E1204 12:42:40.616000 347118 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1133848Z E1204 12:42:40.616000 347118 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1134116Z E1204 12:42:40.616000 347118 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1134242Z E1204 12:42:40.616000 347118 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.1134517Z E1204 12:42:40.616000 347118 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1134667Z E1204 12:42:40.616000 347118 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.1135186Z E1204 12:42:40.616000 347118 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 1. CUDA driver allocated memory was 1268776960 and is now 3292528640.
2025-12-04T12:44:29.1135294Z E1204 12:42:40.616000 347118 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1135483Z E1204 12:42:40.616000 347118 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1135916Z E1204 12:42:40.616000 347118 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda
2025-12-04T12:44:29.1136023Z E1204 12:42:40.616000 347118 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1136223Z E1204 12:42:40.616000 347118 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1136390Z E1204 12:42:40.616000 347118 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T12:44:29.1136430Z FAILED [10.6227s] [ 33%]
2025-12-04T12:44:29.1136433Z 
2025-12-04T12:44:29.1136489Z =================================== FAILURES ===================================
2025-12-04T12:44:29.1136656Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda _
2025-12-04T12:44:29.1136703Z Traceback (most recent call last):
2025-12-04T12:44:29.1136869Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T12:44:29.1136913Z     self._join_processes(fn)
2025-12-04T12:44:29.1137085Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T12:44:29.1137141Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T12:44:29.1137323Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T12:44:29.1137369Z     raise RuntimeError(error)
2025-12-04T12:44:29.1137447Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:44:29.1137495Z Traceback (most recent call last):
2025-12-04T12:44:29.1137660Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1137705Z     getattr(self, test_name)()
2025-12-04T12:44:29.1137864Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1137900Z     fn()
2025-12-04T12:44:29.1138051Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1138095Z     method(*args, **kwargs)
2025-12-04T12:44:29.1138248Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1138293Z     method(*args, **kwargs)
2025-12-04T12:44:29.1138444Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1138488Z     with policy():
2025-12-04T12:44:29.1138653Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1138698Z     raise RuntimeError(msg)
2025-12-04T12:44:29.1139135Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 0. CUDA driver allocated memory was 1438646272 and is now 3682598912.
2025-12-04T12:44:29.1139138Z 
2025-12-04T12:44:29.1139231Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1139564Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda
2025-12-04T12:44:29.1139566Z 
2025-12-04T12:44:29.1139725Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1139727Z 
2025-12-04T12:44:29.1139730Z 
2025-12-04T12:44:29.1139829Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:44:29.1139970Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T12:44:29.1140276Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-d1832f138396539a.xml -
2025-12-04T12:44:29.1140346Z =========================== short test summary info ============================
2025-12-04T12:44:29.1140708Z FAILED [10.6227s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:44:29.1140764Z Traceback (most recent call last):
2025-12-04T12:44:29.1140978Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1141036Z     getattr(self, test_name)()
2025-12-04T12:44:29.1141218Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1141263Z     fn()
2025-12-04T12:44:29.1141442Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1141487Z     method(*args, **kwargs)
2025-12-04T12:44:29.1141677Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1141728Z     method(*args, **kwargs)
2025-12-04T12:44:29.1141901Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1141950Z     with policy():
2025-12-04T12:44:29.1142129Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1142209Z     raise RuntimeError(msg)
2025-12-04T12:44:29.1142629Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 0. CUDA driver allocated memory was 1438646272 and is now 3682598912.
2025-12-04T12:44:29.1142631Z 
2025-12-04T12:44:29.1142730Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1143053Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda
2025-12-04T12:44:29.1143056Z 
2025-12-04T12:44:29.1143161Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1143260Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:44:29.1143351Z ======================= 1 failed, 5 deselected in 10.63s =======================
2025-12-04T12:44:29.1143401Z Got exit code 1
2025-12-04T12:44:29.1143470Z Retrying single test...
2025-12-04T12:44:29.1143707Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-824ccfd58c5f3ae3.xml
2025-12-04T12:44:29.1143797Z ============================= test session starts ==============================
2025-12-04T12:44:29.1143939Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:44:29.1143992Z cachedir: .pytest_cache
2025-12-04T12:44:29.1144190Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:44:29.1144249Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T12:44:29.1144320Z configfile: pytest.ini
2025-12-04T12:44:29.1144500Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T12:44:29.1144605Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T12:44:29.1144918Z stepcurrent: skipping 5 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda
2025-12-04T12:44:29.1144998Z Running 1 items in this shard
2025-12-04T12:44:29.1145000Z 
2025-12-04T12:44:29.1145389Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda I1204 12:42:44.624000 347586 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 347655
2025-12-04T12:44:29.1145606Z I1204 12:42:44.625000 347586 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 347656
2025-12-04T12:44:29.1145785Z I1204 12:42:44.625000 347586 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 347657
2025-12-04T12:44:29.1145948Z I1204 12:42:44.626000 347586 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 347658
2025-12-04T12:44:29.1146650Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1146703Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.1147413Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1147482Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.1148168Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1148240Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.1148909Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1148997Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.1149526Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.1149659Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.1150165Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.1150244Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.1150768Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.1150852Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.1151349Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.1151425Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.1152107Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1152189Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.1152881Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1152933Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.1153464Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.1153558Z   distributed_c10d._get_pg_default_device(pg).type
2025-12-04T12:44:29.1154055Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.1154139Z   distributed_c10d._get_pg_default_device(pg).type
2025-12-04T12:44:29.1154837Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1154907Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.1155594Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1155664Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.1156192Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.1156259Z   distributed_c10d._get_pg_default_device(pg).type
2025-12-04T12:44:29.1156769Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.1156856Z   distributed_c10d._get_pg_default_device(pg).type
2025-12-04T12:44:29.1157117Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1157184Z   local_shape = tensor.shape
2025-12-04T12:44:29.1157434Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1157495Z   tensor.shape,
2025-12-04T12:44:29.1157734Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1159563Z   tensor.dtype,
2025-12-04T12:44:29.1159853Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1159922Z   local_shape = tensor.shape
2025-12-04T12:44:29.1160184Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1160240Z   tensor.shape,
2025-12-04T12:44:29.1160501Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1160568Z   tensor.dtype,
2025-12-04T12:44:29.1160822Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1160881Z   local_shape = tensor.shape
2025-12-04T12:44:29.1161129Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1161210Z   local_shape = tensor.shape
2025-12-04T12:44:29.1161471Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1161518Z   tensor.shape,
2025-12-04T12:44:29.1161771Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1161825Z   tensor.dtype,
2025-12-04T12:44:29.1162088Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1162158Z   tensor.shape,
2025-12-04T12:44:29.1162413Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1162474Z   tensor.dtype,
2025-12-04T12:44:29.1162641Z E1204 12:42:53.925000 347658 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.1162825Z E1204 12:42:53.925000 347658 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.1163121Z E1204 12:42:53.925000 347658 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1163289Z E1204 12:42:53.925000 347658 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.1163588Z E1204 12:42:53.925000 347658 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1163728Z E1204 12:42:53.925000 347658 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.1164004Z E1204 12:42:53.925000 347658 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1164188Z E1204 12:42:53.925000 347658 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1164482Z E1204 12:42:53.925000 347658 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1164639Z E1204 12:42:53.925000 347658 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1164929Z E1204 12:42:53.925000 347658 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1165077Z E1204 12:42:53.925000 347658 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.1165386Z E1204 12:42:53.925000 347658 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1165536Z E1204 12:42:53.925000 347658 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.1166104Z E1204 12:42:53.925000 347658 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 3. CUDA driver allocated memory was 1268776960 and is now 3477078016.
2025-12-04T12:44:29.1166237Z E1204 12:42:53.925000 347658 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1166433Z E1204 12:42:53.925000 347658 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1166899Z E1204 12:42:53.925000 347658 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda
2025-12-04T12:44:29.1167037Z E1204 12:42:53.925000 347658 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1167266Z E1204 12:42:53.925000 347658 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1167445Z E1204 12:42:53.925000 347658 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T12:44:29.1167593Z E1204 12:42:53.984000 347657 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.1167791Z E1204 12:42:53.984000 347657 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.1168087Z E1204 12:42:53.984000 347657 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1168255Z E1204 12:42:53.984000 347657 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.1168552Z E1204 12:42:53.984000 347657 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1168685Z E1204 12:42:53.984000 347657 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.1168983Z E1204 12:42:53.984000 347657 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1169153Z E1204 12:42:53.984000 347657 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1169447Z E1204 12:42:53.984000 347657 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1169640Z E1204 12:42:53.984000 347657 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1169944Z E1204 12:42:53.984000 347657 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1170100Z E1204 12:42:53.984000 347657 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.1170401Z E1204 12:42:53.984000 347657 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1170551Z E1204 12:42:53.984000 347657 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.1171115Z E1204 12:42:53.984000 347657 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 2. CUDA driver allocated memory was 1268776960 and is now 3542089728.
2025-12-04T12:44:29.1171247Z E1204 12:42:53.984000 347657 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1171458Z E1204 12:42:53.984000 347657 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1171914Z E1204 12:42:53.984000 347657 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda
2025-12-04T12:44:29.1172048Z E1204 12:42:53.984000 347657 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1172292Z E1204 12:42:53.984000 347657 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1172467Z E1204 12:42:53.984000 347657 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T12:44:29.1172617Z E1204 12:42:54.491000 347655 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.1172796Z E1204 12:42:54.491000 347655 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.1173089Z E1204 12:42:54.491000 347655 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1173266Z E1204 12:42:54.491000 347655 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.1173561Z E1204 12:42:54.491000 347655 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1173710Z E1204 12:42:54.491000 347655 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.1173995Z E1204 12:42:54.491000 347655 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1174165Z E1204 12:42:54.491000 347655 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1174455Z E1204 12:42:54.491000 347655 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1174617Z E1204 12:42:54.491000 347655 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1174925Z E1204 12:42:54.491000 347655 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1175069Z E1204 12:42:54.491000 347655 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.1175372Z E1204 12:42:54.491000 347655 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1175531Z E1204 12:42:54.491000 347655 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.1176078Z E1204 12:42:54.491000 347655 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 0. CUDA driver allocated memory was 1438646272 and is now 3676307456.
2025-12-04T12:44:29.1176213Z E1204 12:42:54.491000 347655 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1176425Z E1204 12:42:54.491000 347655 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1176887Z E1204 12:42:54.491000 347655 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda
2025-12-04T12:44:29.1177019Z E1204 12:42:54.491000 347655 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1177243Z E1204 12:42:54.491000 347655 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1177434Z E1204 12:42:54.491000 347655 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T12:44:29.1177584Z E1204 12:42:54.493000 347656 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.1177759Z E1204 12:42:54.493000 347656 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.1178051Z E1204 12:42:54.493000 347656 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1178222Z E1204 12:42:54.493000 347656 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.1178505Z E1204 12:42:54.493000 347656 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1178661Z E1204 12:42:54.493000 347656 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.1178954Z E1204 12:42:54.493000 347656 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1179107Z E1204 12:42:54.493000 347656 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1179422Z E1204 12:42:54.493000 347656 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1179565Z E1204 12:42:54.493000 347656 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1179949Z E1204 12:42:54.493000 347656 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1180090Z E1204 12:42:54.493000 347656 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.1180404Z E1204 12:42:54.493000 347656 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1180574Z E1204 12:42:54.493000 347656 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.1181101Z E1204 12:42:54.493000 347656 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 1. CUDA driver allocated memory was 1268776960 and is now 3292528640.
2025-12-04T12:44:29.1181268Z E1204 12:42:54.493000 347656 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1181469Z E1204 12:42:54.493000 347656 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1181926Z E1204 12:42:54.493000 347656 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda
2025-12-04T12:44:29.1182069Z E1204 12:42:54.493000 347656 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1182276Z E1204 12:42:54.493000 347656 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1182475Z E1204 12:42:54.493000 347656 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T12:44:29.1182526Z FAILED [10.7208s] [100%]
2025-12-04T12:44:29.1182528Z 
2025-12-04T12:44:29.1182612Z =================================== FAILURES ===================================
2025-12-04T12:44:29.1182779Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda _
2025-12-04T12:44:29.1182845Z Traceback (most recent call last):
2025-12-04T12:44:29.1183031Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T12:44:29.1183102Z     self._join_processes(fn)
2025-12-04T12:44:29.1183294Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T12:44:29.1183370Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T12:44:29.1183564Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T12:44:29.1183639Z     raise RuntimeError(error)
2025-12-04T12:44:29.1183734Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T12:44:29.1183809Z Traceback (most recent call last):
2025-12-04T12:44:29.1184011Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1184067Z     getattr(self, test_name)()
2025-12-04T12:44:29.1184260Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1184316Z     fn()
2025-12-04T12:44:29.1184496Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1184548Z     method(*args, **kwargs)
2025-12-04T12:44:29.1184728Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1184773Z     method(*args, **kwargs)
2025-12-04T12:44:29.1184976Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1185023Z     with policy():
2025-12-04T12:44:29.1185220Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1185271Z     raise RuntimeError(msg)
2025-12-04T12:44:29.1185694Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 3. CUDA driver allocated memory was 1268776960 and is now 3477078016.
2025-12-04T12:44:29.1185697Z 
2025-12-04T12:44:29.1185799Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1186150Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda
2025-12-04T12:44:29.1186153Z 
2025-12-04T12:44:29.1186263Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1186275Z 
2025-12-04T12:44:29.1186277Z 
2025-12-04T12:44:29.1186366Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:44:29.1186477Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T12:44:29.1186772Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-824ccfd58c5f3ae3.xml -
2025-12-04T12:44:29.1186862Z =========================== short test summary info ============================
2025-12-04T12:44:29.1187194Z FAILED [10.7208s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T12:44:29.1187264Z Traceback (most recent call last):
2025-12-04T12:44:29.1187455Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1187520Z     getattr(self, test_name)()
2025-12-04T12:44:29.1187709Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1187755Z     fn()
2025-12-04T12:44:29.1187934Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1187985Z     method(*args, **kwargs)
2025-12-04T12:44:29.1188167Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1188224Z     method(*args, **kwargs)
2025-12-04T12:44:29.1188400Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1188448Z     with policy():
2025-12-04T12:44:29.1188641Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1188687Z     raise RuntimeError(msg)
2025-12-04T12:44:29.1189138Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 3. CUDA driver allocated memory was 1268776960 and is now 3477078016.
2025-12-04T12:44:29.1189141Z 
2025-12-04T12:44:29.1189237Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1189564Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda
2025-12-04T12:44:29.1189566Z 
2025-12-04T12:44:29.1189743Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1189815Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:44:29.1189915Z ======================= 1 failed, 7 deselected in 10.73s =======================
2025-12-04T12:44:29.1189963Z Got exit code 1
2025-12-04T12:44:29.1190032Z Retrying single test...
2025-12-04T12:44:29.1190268Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-d88bd703a098f5c0.xml
2025-12-04T12:44:29.1190344Z ============================= test session starts ==============================
2025-12-04T12:44:29.1190497Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:44:29.1190581Z cachedir: .pytest_cache
2025-12-04T12:44:29.1190750Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:44:29.1190838Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T12:44:29.1190891Z configfile: pytest.ini
2025-12-04T12:44:29.1191091Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T12:44:29.1191198Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T12:44:29.1191509Z stepcurrent: skipping 5 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda
2025-12-04T12:44:29.1191576Z Running 1 items in this shard
2025-12-04T12:44:29.1191578Z 
2025-12-04T12:44:29.1191970Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda I1204 12:42:58.134000 348124 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 348193
2025-12-04T12:44:29.1192157Z I1204 12:42:58.134000 348124 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 348194
2025-12-04T12:44:29.1192330Z I1204 12:42:58.135000 348124 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 348195
2025-12-04T12:44:29.1192504Z I1204 12:42:58.135000 348124 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 348196
2025-12-04T12:44:29.1193206Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1193262Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.1193983Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1194042Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.1194740Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1194807Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.1195482Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1195565Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.1196076Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.1196156Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.1196673Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.1196734Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.1197246Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.1197311Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.1197814Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.1197890Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.1198587Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1198662Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.1199355Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1199413Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.1199987Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.1200059Z   distributed_c10d._get_pg_default_device(pg).type
2025-12-04T12:44:29.1200574Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.1200663Z   distributed_c10d._get_pg_default_device(pg).type
2025-12-04T12:44:29.1201358Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1201435Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.1202107Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1202178Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.1202702Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.1202770Z   distributed_c10d._get_pg_default_device(pg).type
2025-12-04T12:44:29.1203275Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.1203345Z   distributed_c10d._get_pg_default_device(pg).type
2025-12-04T12:44:29.1203631Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1203692Z   local_shape = tensor.shape
2025-12-04T12:44:29.1203954Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1204013Z   tensor.shape,
2025-12-04T12:44:29.1204255Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1204329Z   tensor.dtype,
2025-12-04T12:44:29.1204575Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1204641Z   local_shape = tensor.shape
2025-12-04T12:44:29.1204896Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1204961Z   tensor.shape,
2025-12-04T12:44:29.1205198Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1205278Z   tensor.dtype,
2025-12-04T12:44:29.1205523Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1205605Z   local_shape = tensor.shape
2025-12-04T12:44:29.1205851Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1205906Z   tensor.shape,
2025-12-04T12:44:29.1206195Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1206242Z   tensor.dtype,
2025-12-04T12:44:29.1206503Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1206555Z   local_shape = tensor.shape
2025-12-04T12:44:29.1206804Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1206869Z   tensor.shape,
2025-12-04T12:44:29.1207129Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
2025-12-04T12:44:29.1207177Z   tensor.dtype,
2025-12-04T12:44:29.1207343Z E1204 12:43:07.432000 348193 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.1207511Z E1204 12:43:07.432000 348193 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.1207837Z E1204 12:43:07.432000 348193 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1208000Z E1204 12:43:07.432000 348193 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.1208311Z E1204 12:43:07.432000 348193 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1208453Z E1204 12:43:07.432000 348193 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.1208742Z E1204 12:43:07.432000 348193 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1214579Z E1204 12:43:07.432000 348193 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1214885Z E1204 12:43:07.432000 348193 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1215039Z E1204 12:43:07.432000 348193 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1215368Z E1204 12:43:07.432000 348193 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1215509Z E1204 12:43:07.432000 348193 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.1215794Z E1204 12:43:07.432000 348193 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1215940Z E1204 12:43:07.432000 348193 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.1216493Z E1204 12:43:07.432000 348193 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 0. CUDA driver allocated memory was 1438646272 and is now 3839885312.
2025-12-04T12:44:29.1216624Z E1204 12:43:07.432000 348193 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1216817Z E1204 12:43:07.432000 348193 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1217253Z E1204 12:43:07.432000 348193 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda
2025-12-04T12:44:29.1217366Z E1204 12:43:07.432000 348193 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1217573Z E1204 12:43:07.432000 348193 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1217737Z E1204 12:43:07.432000 348193 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T12:44:29.1217870Z E1204 12:43:07.510000 348196 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.1218027Z E1204 12:43:07.510000 348196 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.1218309Z E1204 12:43:07.510000 348196 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1218464Z E1204 12:43:07.510000 348196 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.1218748Z E1204 12:43:07.510000 348196 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1218882Z E1204 12:43:07.510000 348196 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.1219156Z E1204 12:43:07.510000 348196 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1219298Z E1204 12:43:07.510000 348196 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1219632Z E1204 12:43:07.510000 348196 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1219787Z E1204 12:43:07.510000 348196 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1220060Z E1204 12:43:07.510000 348196 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1220192Z E1204 12:43:07.510000 348196 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.1220466Z E1204 12:43:07.510000 348196 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1220625Z E1204 12:43:07.510000 348196 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.1221153Z E1204 12:43:07.510000 348196 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 3. CUDA driver allocated memory was 803209216 and is now 3393191936.
2025-12-04T12:44:29.1221284Z E1204 12:43:07.510000 348196 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1221476Z E1204 12:43:07.510000 348196 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1221911Z E1204 12:43:07.510000 348196 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda
2025-12-04T12:44:29.1222025Z E1204 12:43:07.510000 348196 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1222233Z E1204 12:43:07.510000 348196 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1222393Z E1204 12:43:07.510000 348196 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T12:44:29.1222526Z E1204 12:43:07.522000 348195 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.1222680Z E1204 12:43:07.522000 348195 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.1222961Z E1204 12:43:07.522000 348195 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1223113Z E1204 12:43:07.522000 348195 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.1223405Z E1204 12:43:07.522000 348195 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1223526Z E1204 12:43:07.522000 348195 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.1223797Z E1204 12:43:07.522000 348195 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1223939Z E1204 12:43:07.522000 348195 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1224220Z E1204 12:43:07.522000 348195 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1224363Z E1204 12:43:07.522000 348195 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1224636Z E1204 12:43:07.522000 348195 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1224765Z E1204 12:43:07.522000 348195 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.1225051Z E1204 12:43:07.522000 348195 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1225192Z E1204 12:43:07.522000 348195 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.1225736Z E1204 12:43:07.522000 348195 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 2. CUDA driver allocated memory was 1268776960 and is now 3458203648.
2025-12-04T12:44:29.1225847Z E1204 12:43:07.522000 348195 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1226035Z E1204 12:43:07.522000 348195 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1226467Z E1204 12:43:07.522000 348195 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda
2025-12-04T12:44:29.1226578Z E1204 12:43:07.522000 348195 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1226784Z E1204 12:43:07.522000 348195 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1226943Z E1204 12:43:07.522000 348195 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T12:44:29.1227075Z E1204 12:43:07.973000 348194 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.1227230Z E1204 12:43:07.973000 348194 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.1227509Z E1204 12:43:07.973000 348194 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1227667Z E1204 12:43:07.973000 348194 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.1227947Z E1204 12:43:07.973000 348194 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1228068Z E1204 12:43:07.973000 348194 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.1228339Z E1204 12:43:07.973000 348194 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1228482Z E1204 12:43:07.973000 348194 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1228764Z E1204 12:43:07.973000 348194 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1228905Z E1204 12:43:07.973000 348194 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1229176Z E1204 12:43:07.973000 348194 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1229316Z E1204 12:43:07.973000 348194 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.1229626Z E1204 12:43:07.973000 348194 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1229789Z E1204 12:43:07.973000 348194 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.1230313Z E1204 12:43:07.973000 348194 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 1. CUDA driver allocated memory was 1268776960 and is now 3292528640.
2025-12-04T12:44:29.1230424Z E1204 12:43:07.973000 348194 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1230613Z E1204 12:43:07.973000 348194 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1231050Z E1204 12:43:07.973000 348194 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda
2025-12-04T12:44:29.1231158Z E1204 12:43:07.973000 348194 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1231362Z E1204 12:43:07.973000 348194 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1231520Z E1204 12:43:07.973000 348194 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T12:44:29.1231567Z FAILED [10.6224s] [100%]
2025-12-04T12:44:29.1231570Z 
2025-12-04T12:44:29.1231631Z =================================== FAILURES ===================================
2025-12-04T12:44:29.1231790Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda _
2025-12-04T12:44:29.1231844Z Traceback (most recent call last):
2025-12-04T12:44:29.1232023Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T12:44:29.1232071Z     self._join_processes(fn)
2025-12-04T12:44:29.1232245Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T12:44:29.1232303Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T12:44:29.1232482Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T12:44:29.1232531Z     raise RuntimeError(error)
2025-12-04T12:44:29.1232613Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:44:29.1232661Z Traceback (most recent call last):
2025-12-04T12:44:29.1253426Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1253484Z     getattr(self, test_name)()
2025-12-04T12:44:29.1253648Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1253683Z     fn()
2025-12-04T12:44:29.1253837Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1253880Z     method(*args, **kwargs)
2025-12-04T12:44:29.1254029Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1254087Z     method(*args, **kwargs)
2025-12-04T12:44:29.1254237Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1254275Z     with policy():
2025-12-04T12:44:29.1254429Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1254484Z     raise RuntimeError(msg)
2025-12-04T12:44:29.1254894Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 0. CUDA driver allocated memory was 1438646272 and is now 3839885312.
2025-12-04T12:44:29.1254897Z 
2025-12-04T12:44:29.1254974Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1255286Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda
2025-12-04T12:44:29.1255288Z 
2025-12-04T12:44:29.1255377Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1255380Z 
2025-12-04T12:44:29.1255382Z 
2025-12-04T12:44:29.1255462Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:44:29.1255548Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T12:44:29.1255825Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-d88bd703a098f5c0.xml -
2025-12-04T12:44:29.1255887Z =========================== short test summary info ============================
2025-12-04T12:44:29.1256224Z FAILED [10.6224s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:44:29.1256272Z Traceback (most recent call last):
2025-12-04T12:44:29.1256455Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1256520Z     getattr(self, test_name)()
2025-12-04T12:44:29.1256681Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1256720Z     fn()
2025-12-04T12:44:29.1256871Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1256912Z     method(*args, **kwargs)
2025-12-04T12:44:29.1257061Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1257101Z     method(*args, **kwargs)
2025-12-04T12:44:29.1257250Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1257287Z     with policy():
2025-12-04T12:44:29.1257450Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1257495Z     raise RuntimeError(msg)
2025-12-04T12:44:29.1257903Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 0. CUDA driver allocated memory was 1438646272 and is now 3839885312.
2025-12-04T12:44:29.1257906Z 
2025-12-04T12:44:29.1257980Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1258302Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda
2025-12-04T12:44:29.1258304Z 
2025-12-04T12:44:29.1258390Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1258464Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:44:29.1258528Z ======================= 1 failed, 7 deselected in 10.63s =======================
2025-12-04T12:44:29.1258565Z Got exit code 1
2025-12-04T12:44:29.1258824Z FAILED CONSISTENTLY: test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda
2025-12-04T12:44:29.1258953Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:44:29.1259179Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-cc44fcc8e549fc22.xml
2025-12-04T12:44:29.1259238Z ============================= test session starts ==============================
2025-12-04T12:44:29.1259353Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:44:29.1259395Z cachedir: .pytest_cache
2025-12-04T12:44:29.1259553Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:44:29.1259722Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T12:44:29.1259762Z configfile: pytest.ini
2025-12-04T12:44:29.1259927Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T12:44:29.1259999Z collecting ... collected 8 items / 6 deselected / 2 selected
2025-12-04T12:44:29.1260053Z stepcurrent: skipping 6 already run items.
2025-12-04T12:44:29.1260097Z Running 2 items in this shard
2025-12-04T12:44:29.1260099Z 
2025-12-04T12:44:29.1260433Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_hsdp_init_with_device_mesh_cuda I1204 12:43:11.574000 348662 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 348731
2025-12-04T12:44:29.1260607Z I1204 12:43:11.575000 348662 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 348732
2025-12-04T12:44:29.1260758Z I1204 12:43:11.575000 348662 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 348733
2025-12-04T12:44:29.1260909Z I1204 12:43:11.576000 348662 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 348734
2025-12-04T12:44:29.1261616Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:83: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1261664Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.1262333Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:83: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1262400Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.1263062Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:83: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1263115Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.1263775Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:83: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1263817Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.1264313Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.1264364Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.1264848Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.1264897Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.1265391Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.1265437Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.1265916Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.1265962Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.1266098Z E1204 12:43:20.472000 348731 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.1266263Z E1204 12:43:20.472000 348731 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.1266549Z E1204 12:43:20.472000 348731 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1266696Z E1204 12:43:20.472000 348731 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.1266972Z E1204 12:43:20.472000 348731 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1267114Z E1204 12:43:20.472000 348731 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.1267383Z E1204 12:43:20.472000 348731 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1267538Z E1204 12:43:20.472000 348731 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1267805Z E1204 12:43:20.472000 348731 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1267945Z E1204 12:43:20.472000 348731 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1268214Z E1204 12:43:20.472000 348731 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1268341Z E1204 12:43:20.472000 348731 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.1268613Z E1204 12:43:20.472000 348731 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1268753Z E1204 12:43:20.472000 348731 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.1269237Z E1204 12:43:20.472000 348731 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3210739712.
2025-12-04T12:44:29.1269350Z E1204 12:43:20.472000 348731 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1269543Z E1204 12:43:20.472000 348731 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1269985Z E1204 12:43:20.472000 348731 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda
2025-12-04T12:44:29.1270097Z E1204 12:43:20.472000 348731 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1270300Z E1204 12:43:20.472000 348731 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1270460Z E1204 12:43:20.472000 348731 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T12:44:29.1270593Z E1204 12:43:20.487000 348732 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.1270775Z E1204 12:43:20.487000 348732 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.1271056Z E1204 12:43:20.487000 348732 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1271201Z E1204 12:43:20.487000 348732 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.1271482Z E1204 12:43:20.487000 348732 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1271616Z E1204 12:43:20.487000 348732 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.1271890Z E1204 12:43:20.487000 348732 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1272047Z E1204 12:43:20.487000 348732 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1272316Z E1204 12:43:20.487000 348732 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1272457Z E1204 12:43:20.487000 348732 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1272725Z E1204 12:43:20.487000 348732 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1272855Z E1204 12:43:20.487000 348732 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.1273125Z E1204 12:43:20.487000 348732 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1273269Z E1204 12:43:20.487000 348732 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.1273746Z E1204 12:43:20.487000 348732 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 1268776960 and is now 3051356160.
2025-12-04T12:44:29.1273855Z E1204 12:43:20.487000 348732 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1274060Z E1204 12:43:20.487000 348732 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1274437Z E1204 12:43:20.487000 348732 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda
2025-12-04T12:44:29.1274546Z E1204 12:43:20.487000 348732 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1274750Z E1204 12:43:20.487000 348732 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1274909Z E1204 12:43:20.487000 348732 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T12:44:29.1275049Z E1204 12:43:20.539000 348734 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.1275208Z E1204 12:43:20.539000 348734 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.1275487Z E1204 12:43:20.539000 348734 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1275633Z E1204 12:43:20.539000 348734 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.1275923Z E1204 12:43:20.539000 348734 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1276037Z E1204 12:43:20.539000 348734 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.1276321Z E1204 12:43:20.539000 348734 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1276460Z E1204 12:43:20.539000 348734 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1276729Z E1204 12:43:20.539000 348734 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1276870Z E1204 12:43:20.539000 348734 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1277137Z E1204 12:43:20.539000 348734 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1277268Z E1204 12:43:20.539000 348734 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.1277539Z E1204 12:43:20.539000 348734 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1277681Z E1204 12:43:20.539000 348734 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.1278152Z E1204 12:43:20.539000 348734 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 1256194048 and is now 3051356160.
2025-12-04T12:44:29.1278264Z E1204 12:43:20.539000 348734 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1278464Z E1204 12:43:20.539000 348734 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1278836Z E1204 12:43:20.539000 348734 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda
2025-12-04T12:44:29.1278945Z E1204 12:43:20.539000 348734 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1279147Z E1204 12:43:20.539000 348734 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1279315Z E1204 12:43:20.539000 348734 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T12:44:29.1279447Z E1204 12:43:20.543000 348733 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.1279644Z E1204 12:43:20.543000 348733 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.1279925Z E1204 12:43:20.543000 348733 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1280089Z E1204 12:43:20.543000 348733 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.1280369Z E1204 12:43:20.543000 348733 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1280501Z E1204 12:43:20.543000 348733 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.1280773Z E1204 12:43:20.543000 348733 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1280911Z E1204 12:43:20.543000 348733 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1281180Z E1204 12:43:20.543000 348733 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1281320Z E1204 12:43:20.543000 348733 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1281590Z E1204 12:43:20.543000 348733 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1281720Z E1204 12:43:20.543000 348733 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.1281988Z E1204 12:43:20.543000 348733 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1282129Z E1204 12:43:20.543000 348733 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.1282602Z E1204 12:43:20.543000 348733 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 1268776960 and is now 3051356160.
2025-12-04T12:44:29.1282728Z E1204 12:43:20.543000 348733 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1282917Z E1204 12:43:20.543000 348733 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1283292Z E1204 12:43:20.543000 348733 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda
2025-12-04T12:44:29.1283403Z E1204 12:43:20.543000 348733 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1283633Z E1204 12:43:20.543000 348733 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1283795Z E1204 12:43:20.543000 348733 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T12:44:29.1283836Z FAILED [10.3192s] [ 50%]
2025-12-04T12:44:29.1283838Z 
2025-12-04T12:44:29.1283897Z =================================== FAILURES ===================================
2025-12-04T12:44:29.1284004Z __ TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda ___
2025-12-04T12:44:29.1284052Z Traceback (most recent call last):
2025-12-04T12:44:29.1284215Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T12:44:29.1284272Z     self._join_processes(fn)
2025-12-04T12:44:29.1284444Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T12:44:29.1284501Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T12:44:29.1284694Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T12:44:29.1284743Z     raise RuntimeError(error)
2025-12-04T12:44:29.1284823Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:44:29.1284873Z Traceback (most recent call last):
2025-12-04T12:44:29.1285040Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1285086Z     getattr(self, test_name)()
2025-12-04T12:44:29.1285249Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1285285Z     fn()
2025-12-04T12:44:29.1285439Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1285480Z     method(*args, **kwargs)
2025-12-04T12:44:29.1285634Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1285675Z     method(*args, **kwargs)
2025-12-04T12:44:29.1285828Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1285866Z     with policy():
2025-12-04T12:44:29.1286021Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1286062Z     raise RuntimeError(msg)
2025-12-04T12:44:29.1286417Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3210739712.
2025-12-04T12:44:29.1286419Z 
2025-12-04T12:44:29.1286495Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1286766Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda
2025-12-04T12:44:29.1286769Z 
2025-12-04T12:44:29.1286856Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1286861Z 
2025-12-04T12:44:29.1286862Z 
2025-12-04T12:44:29.1286937Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:44:29.1287026Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T12:44:29.1287300Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-cc44fcc8e549fc22.xml -
2025-12-04T12:44:29.1287363Z =========================== short test summary info ============================
2025-12-04T12:44:29.1287648Z FAILED [10.3192s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_hsdp_init_with_device_mesh_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:44:29.1287699Z Traceback (most recent call last):
2025-12-04T12:44:29.1287864Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1287909Z     getattr(self, test_name)()
2025-12-04T12:44:29.1288068Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1288117Z     fn()
2025-12-04T12:44:29.1288267Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1288309Z     method(*args, **kwargs)
2025-12-04T12:44:29.1288461Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1288513Z     method(*args, **kwargs)
2025-12-04T12:44:29.1288664Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1288705Z     with policy():
2025-12-04T12:44:29.1288857Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1288900Z     raise RuntimeError(msg)
2025-12-04T12:44:29.1289256Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3210739712.
2025-12-04T12:44:29.1289259Z 
2025-12-04T12:44:29.1289334Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1289635Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda
2025-12-04T12:44:29.1289639Z 
2025-12-04T12:44:29.1289726Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1289795Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:44:29.1289857Z ======================= 1 failed, 6 deselected in 10.33s =======================
2025-12-04T12:44:29.1289897Z Got exit code 1
2025-12-04T12:44:29.1289940Z Retrying single test...
2025-12-04T12:44:29.1290168Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-2cdbbc51773bb60a.xml
2025-12-04T12:44:29.1290227Z ============================= test session starts ==============================
2025-12-04T12:44:29.1290344Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:44:29.1290386Z cachedir: .pytest_cache
2025-12-04T12:44:29.1290561Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:44:29.1290610Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T12:44:29.1290650Z configfile: pytest.ini
2025-12-04T12:44:29.1290818Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T12:44:29.1290891Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T12:44:29.1291144Z stepcurrent: skipping 6 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_hsdp_init_with_device_mesh_cuda
2025-12-04T12:44:29.1291188Z Running 1 items in this shard
2025-12-04T12:44:29.1291191Z 
2025-12-04T12:44:29.1291543Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_hsdp_init_with_device_mesh_cuda I1204 12:43:24.455000 349132 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 349201
2025-12-04T12:44:29.1291700Z I1204 12:43:24.456000 349132 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 349202
2025-12-04T12:44:29.1291854Z I1204 12:43:24.456000 349132 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 349203
2025-12-04T12:44:29.1292004Z I1204 12:43:24.457000 349132 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 349204
2025-12-04T12:44:29.1292702Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:83: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1292762Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.1293422Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:83: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1293469Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.1294126Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:83: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1294171Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.1294831Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:83: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1294872Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.1295381Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.1295432Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.1295923Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.1295975Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.1296474Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.1296525Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.1297008Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.1297073Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.1297208Z E1204 12:43:33.177000 349204 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.1297364Z E1204 12:43:33.177000 349204 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.1297663Z E1204 12:43:33.177000 349204 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1297808Z E1204 12:43:33.177000 349204 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.1298088Z E1204 12:43:33.177000 349204 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1298204Z E1204 12:43:33.177000 349204 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.1298481Z E1204 12:43:33.177000 349204 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1298623Z E1204 12:43:33.177000 349204 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1298894Z E1204 12:43:33.177000 349204 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1299037Z E1204 12:43:33.177000 349204 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1299306Z E1204 12:43:33.177000 349204 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1299438Z E1204 12:43:33.177000 349204 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.1299772Z E1204 12:43:33.177000 349204 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1299915Z E1204 12:43:33.177000 349204 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.1300393Z E1204 12:43:33.177000 349204 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 1260388352 and is now 3051356160.
2025-12-04T12:44:29.1300505Z E1204 12:43:33.177000 349204 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1300712Z E1204 12:43:33.177000 349204 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1301087Z E1204 12:43:33.177000 349204 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda
2025-12-04T12:44:29.1301196Z E1204 12:43:33.177000 349204 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1301400Z E1204 12:43:33.177000 349204 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1301576Z E1204 12:43:33.177000 349204 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T12:44:29.1301707Z E1204 12:43:33.179000 349203 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.1301877Z E1204 12:43:33.179000 349203 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.1302158Z E1204 12:43:33.179000 349203 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1302304Z E1204 12:43:33.179000 349203 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.1302584Z E1204 12:43:33.179000 349203 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1302698Z E1204 12:43:33.179000 349203 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.1302969Z E1204 12:43:33.179000 349203 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1303107Z E1204 12:43:33.179000 349203 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1303376Z E1204 12:43:33.179000 349203 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1303516Z E1204 12:43:33.179000 349203 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1303787Z E1204 12:43:33.179000 349203 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1303928Z E1204 12:43:33.179000 349203 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.1304198Z E1204 12:43:33.179000 349203 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1304340Z E1204 12:43:33.179000 349203 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.1304812Z E1204 12:43:33.179000 349203 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 1268776960 and is now 3051356160.
2025-12-04T12:44:29.1304934Z E1204 12:43:33.179000 349203 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1305127Z E1204 12:43:33.179000 349203 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1305503Z E1204 12:43:33.179000 349203 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda
2025-12-04T12:44:29.1305611Z E1204 12:43:33.179000 349203 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1305824Z E1204 12:43:33.179000 349203 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1305984Z E1204 12:43:33.179000 349203 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T12:44:29.1306125Z E1204 12:43:33.218000 349201 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.1306277Z E1204 12:43:33.218000 349201 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.1306555Z E1204 12:43:33.218000 349201 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1306704Z E1204 12:43:33.218000 349201 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.1306982Z E1204 12:43:33.218000 349201 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1307097Z E1204 12:43:33.218000 349201 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.1307366Z E1204 12:43:33.218000 349201 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1307504Z E1204 12:43:33.218000 349201 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1307775Z E1204 12:43:33.218000 349201 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1307914Z E1204 12:43:33.218000 349201 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1308184Z E1204 12:43:33.218000 349201 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1308324Z E1204 12:43:33.218000 349201 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.1308595Z E1204 12:43:33.218000 349201 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1308735Z E1204 12:43:33.218000 349201 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.1309217Z E1204 12:43:33.218000 349201 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3206545408.
2025-12-04T12:44:29.1309330Z E1204 12:43:33.218000 349201 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1309520Z E1204 12:43:33.218000 349201 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1309936Z E1204 12:43:33.218000 349201 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda
2025-12-04T12:44:29.1310057Z E1204 12:43:33.218000 349201 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1310259Z E1204 12:43:33.218000 349201 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1310435Z E1204 12:43:33.218000 349201 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T12:44:29.1310564Z E1204 12:43:33.266000 349202 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.1310716Z E1204 12:43:33.266000 349202 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.1310993Z E1204 12:43:33.266000 349202 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1311141Z E1204 12:43:33.266000 349202 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.1311418Z E1204 12:43:33.266000 349202 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1311537Z E1204 12:43:33.266000 349202 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.1311810Z E1204 12:43:33.266000 349202 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1311953Z E1204 12:43:33.266000 349202 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1312223Z E1204 12:43:33.266000 349202 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1312361Z E1204 12:43:33.266000 349202 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1312650Z E1204 12:43:33.266000 349202 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1312777Z E1204 12:43:33.266000 349202 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.1313048Z E1204 12:43:33.266000 349202 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1313188Z E1204 12:43:33.266000 349202 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.1313673Z E1204 12:43:33.266000 349202 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 1268776960 and is now 3051356160.
2025-12-04T12:44:29.1313784Z E1204 12:43:33.266000 349202 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1313973Z E1204 12:43:33.266000 349202 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1314348Z E1204 12:43:33.266000 349202 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda
2025-12-04T12:44:29.1314466Z E1204 12:43:33.266000 349202 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1314672Z E1204 12:43:33.266000 349202 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1314854Z E1204 12:43:33.266000 349202 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T12:44:29.1314896Z FAILED [10.0198s] [100%]
2025-12-04T12:44:29.1314898Z 
2025-12-04T12:44:29.1314957Z =================================== FAILURES ===================================
2025-12-04T12:44:29.1315066Z __ TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda ___
2025-12-04T12:44:29.1315118Z Traceback (most recent call last):
2025-12-04T12:44:29.1315284Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T12:44:29.1315331Z     self._join_processes(fn)
2025-12-04T12:44:29.1315504Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T12:44:29.1315565Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T12:44:29.1315747Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T12:44:29.1315793Z     raise RuntimeError(error)
2025-12-04T12:44:29.1315873Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T12:44:29.1315923Z Traceback (most recent call last):
2025-12-04T12:44:29.1316085Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1316133Z     getattr(self, test_name)()
2025-12-04T12:44:29.1316293Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1316331Z     fn()
2025-12-04T12:44:29.1316485Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1316531Z     method(*args, **kwargs)
2025-12-04T12:44:29.1316697Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1316740Z     method(*args, **kwargs)
2025-12-04T12:44:29.1316892Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1316932Z     with policy():
2025-12-04T12:44:29.1317087Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1317133Z     raise RuntimeError(msg)
2025-12-04T12:44:29.1317492Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 1260388352 and is now 3051356160.
2025-12-04T12:44:29.1317495Z 
2025-12-04T12:44:29.1317582Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1317845Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda
2025-12-04T12:44:29.1317847Z 
2025-12-04T12:44:29.1317936Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1317938Z 
2025-12-04T12:44:29.1317940Z 
2025-12-04T12:44:29.1318018Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:44:29.1318120Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T12:44:29.1318396Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-2cdbbc51773bb60a.xml -
2025-12-04T12:44:29.1318459Z =========================== short test summary info ============================
2025-12-04T12:44:29.1318742Z FAILED [10.0198s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_hsdp_init_with_device_mesh_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T12:44:29.1318790Z Traceback (most recent call last):
2025-12-04T12:44:29.1318956Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1319003Z     getattr(self, test_name)()
2025-12-04T12:44:29.1319162Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1319202Z     fn()
2025-12-04T12:44:29.1319353Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1319397Z     method(*args, **kwargs)
2025-12-04T12:44:29.1319550Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1319622Z     method(*args, **kwargs)
2025-12-04T12:44:29.1319773Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1319813Z     with policy():
2025-12-04T12:44:29.1319965Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1320009Z     raise RuntimeError(msg)
2025-12-04T12:44:29.1320360Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 1260388352 and is now 3051356160.
2025-12-04T12:44:29.1320366Z 
2025-12-04T12:44:29.1320439Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1320713Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda
2025-12-04T12:44:29.1320715Z 
2025-12-04T12:44:29.1320802Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1320867Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:44:29.1320931Z ======================= 1 failed, 7 deselected in 10.03s =======================
2025-12-04T12:44:29.1320970Z Got exit code 1
2025-12-04T12:44:29.1321013Z Retrying single test...
2025-12-04T12:44:29.1321241Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-66cb2d4bdeb6a169.xml
2025-12-04T12:44:29.1321299Z ============================= test session starts ==============================
2025-12-04T12:44:29.1321427Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:44:29.1321471Z cachedir: .pytest_cache
2025-12-04T12:44:29.1321634Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:44:29.1321680Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T12:44:29.1321724Z configfile: pytest.ini
2025-12-04T12:44:29.1321890Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T12:44:29.1321965Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T12:44:29.1322229Z stepcurrent: skipping 6 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_hsdp_init_with_device_mesh_cuda
2025-12-04T12:44:29.1322277Z Running 1 items in this shard
2025-12-04T12:44:29.1322279Z 
2025-12-04T12:44:29.1322613Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_hsdp_init_with_device_mesh_cuda I1204 12:43:37.261000 349602 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 349671
2025-12-04T12:44:29.1322781Z I1204 12:43:37.262000 349602 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 349672
2025-12-04T12:44:29.1322937Z I1204 12:43:37.262000 349602 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 349673
2025-12-04T12:44:29.1323088Z I1204 12:43:37.263000 349602 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 349674
2025-12-04T12:44:29.1323765Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:83: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1323809Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.1324477Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:83: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1324522Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.1325192Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:83: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1325238Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.1325899Z /var/lib/jenkins/pytorch/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:83: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1325941Z   FSDP.set_state_dict_type(
2025-12-04T12:44:29.1326455Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.1326505Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.1326990Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.1327051Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.1327540Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.1327601Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.1328085Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 
2025-12-04T12:44:29.1328135Z   device = _get_pg_default_device(group)
2025-12-04T12:44:29.1328271Z E1204 12:43:46.003000 349671 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.1328431Z E1204 12:43:46.003000 349671 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.1328712Z E1204 12:43:46.003000 349671 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1328861Z E1204 12:43:46.003000 349671 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.1329142Z E1204 12:43:46.003000 349671 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1329258Z E1204 12:43:46.003000 349671 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.1329542Z E1204 12:43:46.003000 349671 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1329716Z E1204 12:43:46.003000 349671 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1329988Z E1204 12:43:46.003000 349671 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1330130Z E1204 12:43:46.003000 349671 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1330403Z E1204 12:43:46.003000 349671 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1330551Z E1204 12:43:46.003000 349671 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.1330822Z E1204 12:43:46.003000 349671 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1330967Z E1204 12:43:46.003000 349671 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.1331449Z E1204 12:43:46.003000 349671 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3210739712.
2025-12-04T12:44:29.1331586Z E1204 12:43:46.003000 349671 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1331791Z E1204 12:43:46.003000 349671 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1332167Z E1204 12:43:46.003000 349671 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda
2025-12-04T12:44:29.1332278Z E1204 12:43:46.003000 349671 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1332482Z E1204 12:43:46.003000 349671 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1332642Z E1204 12:43:46.003000 349671 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T12:44:29.1332774Z E1204 12:43:46.063000 349674 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.1332929Z E1204 12:43:46.063000 349674 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.1333207Z E1204 12:43:46.063000 349674 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1333357Z E1204 12:43:46.063000 349674 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.1333635Z E1204 12:43:46.063000 349674 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1333753Z E1204 12:43:46.063000 349674 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.1334038Z E1204 12:43:46.063000 349674 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1334179Z E1204 12:43:46.063000 349674 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1334449Z E1204 12:43:46.063000 349674 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1334589Z E1204 12:43:46.063000 349674 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1334868Z E1204 12:43:46.063000 349674 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1334997Z E1204 12:43:46.063000 349674 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.1335271Z E1204 12:43:46.063000 349674 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1335412Z E1204 12:43:46.063000 349674 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.1335893Z E1204 12:43:46.063000 349674 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 1254096896 and is now 3051356160.
2025-12-04T12:44:29.1336014Z E1204 12:43:46.063000 349674 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1336203Z E1204 12:43:46.063000 349674 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1336588Z E1204 12:43:46.063000 349674 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda
2025-12-04T12:44:29.1336695Z E1204 12:43:46.063000 349674 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1336900Z E1204 12:43:46.063000 349674 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1337060Z E1204 12:43:46.063000 349674 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T12:44:29.1337190Z E1204 12:43:46.080000 349673 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.1337345Z E1204 12:43:46.080000 349673 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.1337622Z E1204 12:43:46.080000 349673 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1337771Z E1204 12:43:46.080000 349673 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.1338048Z E1204 12:43:46.080000 349673 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1338176Z E1204 12:43:46.080000 349673 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.1338445Z E1204 12:43:46.080000 349673 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1338585Z E1204 12:43:46.080000 349673 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1338855Z E1204 12:43:46.080000 349673 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1338994Z E1204 12:43:46.080000 349673 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1339275Z E1204 12:43:46.080000 349673 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1339404Z E1204 12:43:46.080000 349673 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.1339714Z E1204 12:43:46.080000 349673 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1339868Z E1204 12:43:46.080000 349673 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.1340340Z E1204 12:43:46.080000 349673 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 1268776960 and is now 3051356160.
2025-12-04T12:44:29.1340462Z E1204 12:43:46.080000 349673 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1340652Z E1204 12:43:46.080000 349673 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1341028Z E1204 12:43:46.080000 349673 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda
2025-12-04T12:44:29.1341136Z E1204 12:43:46.080000 349673 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1341341Z E1204 12:43:46.080000 349673 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1341501Z E1204 12:43:46.080000 349673 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T12:44:29.1341632Z E1204 12:43:46.083000 349672 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.1341786Z E1204 12:43:46.083000 349672 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.1342064Z E1204 12:43:46.083000 349672 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1342213Z E1204 12:43:46.083000 349672 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.1342503Z E1204 12:43:46.083000 349672 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1342621Z E1204 12:43:46.083000 349672 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.1342886Z E1204 12:43:46.083000 349672 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1343031Z E1204 12:43:46.083000 349672 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1343299Z E1204 12:43:46.083000 349672 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1343454Z E1204 12:43:46.083000 349672 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1343725Z E1204 12:43:46.083000 349672 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1343851Z E1204 12:43:46.083000 349672 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.1344123Z E1204 12:43:46.083000 349672 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1344274Z E1204 12:43:46.083000 349672 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.1344748Z E1204 12:43:46.083000 349672 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 1268776960 and is now 3051356160.
2025-12-04T12:44:29.1344870Z E1204 12:43:46.083000 349672 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1345059Z E1204 12:43:46.083000 349672 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1345434Z E1204 12:43:46.083000 349672 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda
2025-12-04T12:44:29.1345541Z E1204 12:43:46.083000 349672 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1345751Z E1204 12:43:46.083000 349672 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1345908Z E1204 12:43:46.083000 349672 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T12:44:29.1345953Z FAILED [10.0218s] [100%]
2025-12-04T12:44:29.1345955Z 
2025-12-04T12:44:29.1346012Z =================================== FAILURES ===================================
2025-12-04T12:44:29.1346123Z __ TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda ___
2025-12-04T12:44:29.1346171Z Traceback (most recent call last):
2025-12-04T12:44:29.1346338Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T12:44:29.1346382Z     self._join_processes(fn)
2025-12-04T12:44:29.1346558Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T12:44:29.1346627Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T12:44:29.1346806Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T12:44:29.1346853Z     raise RuntimeError(error)
2025-12-04T12:44:29.1346933Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:44:29.1346981Z Traceback (most recent call last):
2025-12-04T12:44:29.1347144Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1347191Z     getattr(self, test_name)()
2025-12-04T12:44:29.1347352Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1347390Z     fn()
2025-12-04T12:44:29.1347559Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1347608Z     method(*args, **kwargs)
2025-12-04T12:44:29.1347759Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1347802Z     method(*args, **kwargs)
2025-12-04T12:44:29.1347954Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1347995Z     with policy():
2025-12-04T12:44:29.1348147Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1348206Z     raise RuntimeError(msg)
2025-12-04T12:44:29.1348559Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3210739712.
2025-12-04T12:44:29.1348572Z 
2025-12-04T12:44:29.1348650Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1348907Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda
2025-12-04T12:44:29.1348909Z 
2025-12-04T12:44:29.1348998Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1349000Z 
2025-12-04T12:44:29.1349002Z 
2025-12-04T12:44:29.1349080Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:44:29.1349167Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T12:44:29.1349444Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-66cb2d4bdeb6a169.xml -
2025-12-04T12:44:29.1349506Z =========================== short test summary info ============================
2025-12-04T12:44:29.1350042Z FAILED [10.0218s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_hsdp_init_with_device_mesh_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:44:29.1350089Z Traceback (most recent call last):
2025-12-04T12:44:29.1350257Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1350301Z     getattr(self, test_name)()
2025-12-04T12:44:29.1350465Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1350502Z     fn()
2025-12-04T12:44:29.1350652Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1350697Z     method(*args, **kwargs)
2025-12-04T12:44:29.1350865Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1350908Z     method(*args, **kwargs)
2025-12-04T12:44:29.1351058Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1351098Z     with policy():
2025-12-04T12:44:29.1351249Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1351294Z     raise RuntimeError(msg)
2025-12-04T12:44:29.1351648Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 1438646272 and is now 3210739712.
2025-12-04T12:44:29.1351651Z 
2025-12-04T12:44:29.1351745Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1352002Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda
2025-12-04T12:44:29.1352004Z 
2025-12-04T12:44:29.1352096Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1352162Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:44:29.1352226Z ======================= 1 failed, 7 deselected in 10.03s =======================
2025-12-04T12:44:29.1352284Z Got exit code 1
2025-12-04T12:44:29.1352494Z FAILED CONSISTENTLY: test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_hsdp_init_with_device_mesh_cuda
2025-12-04T12:44:29.1352623Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:44:29.1352863Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-02c126275fdbc82a.xml
2025-12-04T12:44:29.1352924Z ============================= test session starts ==============================
2025-12-04T12:44:29.1353037Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:44:29.1353082Z cachedir: .pytest_cache
2025-12-04T12:44:29.1353241Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:44:29.1353290Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T12:44:29.1353331Z configfile: pytest.ini
2025-12-04T12:44:29.1353499Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T12:44:29.1353571Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T12:44:29.1353628Z stepcurrent: skipping 7 already run items.
2025-12-04T12:44:29.1353671Z Running 1 items in this shard
2025-12-04T12:44:29.1353674Z 
2025-12-04T12:44:29.1354005Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_root_module_is_not_FSDP_cuda I1204 12:43:50.179000 350072 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 350141
2025-12-04T12:44:29.1354161Z I1204 12:43:50.179000 350072 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 350142
2025-12-04T12:44:29.1354318Z I1204 12:43:50.180000 350072 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 350143
2025-12-04T12:44:29.1354473Z I1204 12:43:50.180000 350072 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 350144
2025-12-04T12:44:29.1355570Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:76.)
2025-12-04T12:44:29.1355700Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:44:29.1356775Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:76.)
2025-12-04T12:44:29.1356912Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:44:29.1357970Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:76.)
2025-12-04T12:44:29.1358106Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:44:29.1359169Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:76.)
2025-12-04T12:44:29.1359291Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:44:29.1360184Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1360282Z   prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type(
2025-12-04T12:44:29.1360990Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1361098Z   prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type(
2025-12-04T12:44:29.1361800Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1361892Z   prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type(
2025-12-04T12:44:29.1362610Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1362709Z   prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type(
2025-12-04T12:44:29.1363405Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:829: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1363470Z   FullyShardedDataParallel.set_state_dict_type(
2025-12-04T12:44:29.1364168Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:829: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1364232Z   FullyShardedDataParallel.set_state_dict_type(
2025-12-04T12:44:29.1364927Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:829: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1365006Z   FullyShardedDataParallel.set_state_dict_type(
2025-12-04T12:44:29.1365703Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:829: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1365763Z   FullyShardedDataParallel.set_state_dict_type(
2025-12-04T12:44:29.1365900Z E1204 12:43:58.880000 350144 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.1366066Z E1204 12:43:58.880000 350144 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.1366351Z E1204 12:43:58.880000 350144 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1366498Z E1204 12:43:58.880000 350144 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.1366780Z E1204 12:43:58.880000 350144 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1366910Z E1204 12:43:58.880000 350144 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.1367183Z E1204 12:43:58.880000 350144 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1367343Z E1204 12:43:58.880000 350144 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1367613Z E1204 12:43:58.880000 350144 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1367758Z E1204 12:43:58.880000 350144 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1368029Z E1204 12:43:58.880000 350144 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1368163Z E1204 12:43:58.880000 350144 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.1368440Z E1204 12:43:58.880000 350144 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1368581Z E1204 12:43:58.880000 350144 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.1369059Z E1204 12:43:58.880000 350144 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 3. CUDA driver allocated memory was 1268776960 and is now 3047161856.
2025-12-04T12:44:29.1369170Z E1204 12:43:58.880000 350144 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1369366Z E1204 12:43:58.880000 350144 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1369777Z E1204 12:43:58.880000 350144 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda
2025-12-04T12:44:29.1369888Z E1204 12:43:58.880000 350144 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1370095Z E1204 12:43:58.880000 350144 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1370255Z E1204 12:43:58.880000 350144 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T12:44:29.1370408Z E1204 12:43:58.903000 350142 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.1370562Z E1204 12:43:58.903000 350142 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.1370844Z E1204 12:43:58.903000 350142 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1370992Z E1204 12:43:58.903000 350142 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.1371287Z E1204 12:43:58.903000 350142 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1371401Z E1204 12:43:58.903000 350142 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.1371674Z E1204 12:43:58.903000 350142 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1371833Z E1204 12:43:58.903000 350142 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1372102Z E1204 12:43:58.903000 350142 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1372245Z E1204 12:43:58.903000 350142 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1372514Z E1204 12:43:58.903000 350142 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1372646Z E1204 12:43:58.903000 350142 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.1372917Z E1204 12:43:58.903000 350142 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1373061Z E1204 12:43:58.903000 350142 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.1373531Z E1204 12:43:58.903000 350142 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 1. CUDA driver allocated memory was 1268776960 and is now 3047161856.
2025-12-04T12:44:29.1373641Z E1204 12:43:58.903000 350142 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1373848Z E1204 12:43:58.903000 350142 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1374220Z E1204 12:43:58.903000 350142 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda
2025-12-04T12:44:29.1374330Z E1204 12:43:58.903000 350142 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1374536Z E1204 12:43:58.903000 350142 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1374696Z E1204 12:43:58.903000 350142 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T12:44:29.1374842Z E1204 12:43:58.905000 350141 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.1374996Z E1204 12:43:58.905000 350141 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.1375275Z E1204 12:43:58.905000 350141 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1375421Z E1204 12:43:58.905000 350141 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.1375709Z E1204 12:43:58.905000 350141 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1375825Z E1204 12:43:58.905000 350141 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.1376110Z E1204 12:43:58.905000 350141 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1376250Z E1204 12:43:58.905000 350141 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1376520Z E1204 12:43:58.905000 350141 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1376662Z E1204 12:43:58.905000 350141 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1376934Z E1204 12:43:58.905000 350141 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1377065Z E1204 12:43:58.905000 350141 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.1377334Z E1204 12:43:58.905000 350141 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1377478Z E1204 12:43:58.905000 350141 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.1377942Z E1204 12:43:58.905000 350141 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 0. CUDA driver allocated memory was 1438646272 and is now 3208642560.
2025-12-04T12:44:29.1378054Z E1204 12:43:58.905000 350141 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1378258Z E1204 12:43:58.905000 350141 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1378628Z E1204 12:43:58.905000 350141 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda
2025-12-04T12:44:29.1378739Z E1204 12:43:58.905000 350141 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1378942Z E1204 12:43:58.905000 350141 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1379115Z E1204 12:43:58.905000 350141 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T12:44:29.1379252Z E1204 12:43:58.916000 350143 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.1379408Z E1204 12:43:58.916000 350143 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.1379733Z E1204 12:43:58.916000 350143 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1379895Z E1204 12:43:58.916000 350143 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.1380178Z E1204 12:43:58.916000 350143 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1380319Z E1204 12:43:58.916000 350143 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.1380596Z E1204 12:43:58.916000 350143 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1380738Z E1204 12:43:58.916000 350143 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1381010Z E1204 12:43:58.916000 350143 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1381153Z E1204 12:43:58.916000 350143 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1381425Z E1204 12:43:58.916000 350143 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1381556Z E1204 12:43:58.916000 350143 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.1381828Z E1204 12:43:58.916000 350143 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1381974Z E1204 12:43:58.916000 350143 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.1382442Z E1204 12:43:58.916000 350143 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 2. CUDA driver allocated memory was 1268776960 and is now 3047161856.
2025-12-04T12:44:29.1382567Z E1204 12:43:58.916000 350143 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1382759Z E1204 12:43:58.916000 350143 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1383136Z E1204 12:43:58.916000 350143 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda
2025-12-04T12:44:29.1383249Z E1204 12:43:58.916000 350143 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1383468Z E1204 12:43:58.916000 350143 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1383631Z E1204 12:43:58.916000 350143 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T12:44:29.1383673Z FAILED [10.0235s] [100%]
2025-12-04T12:44:29.1383675Z 
2025-12-04T12:44:29.1383735Z =================================== FAILURES ===================================
2025-12-04T12:44:29.1383843Z ____ TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda ____
2025-12-04T12:44:29.1383895Z Traceback (most recent call last):
2025-12-04T12:44:29.1384059Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T12:44:29.1384118Z     self._join_processes(fn)
2025-12-04T12:44:29.1384293Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T12:44:29.1384351Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T12:44:29.1384547Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T12:44:29.1384591Z     raise RuntimeError(error)
2025-12-04T12:44:29.1384675Z RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T12:44:29.1384721Z Traceback (most recent call last):
2025-12-04T12:44:29.1384888Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1384931Z     getattr(self, test_name)()
2025-12-04T12:44:29.1385096Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1385132Z     fn()
2025-12-04T12:44:29.1385288Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1385330Z     method(*args, **kwargs)
2025-12-04T12:44:29.1385487Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1385529Z     method(*args, **kwargs)
2025-12-04T12:44:29.1385683Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1385722Z     with policy():
2025-12-04T12:44:29.1385878Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1385919Z     raise RuntimeError(msg)
2025-12-04T12:44:29.1386272Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 2. CUDA driver allocated memory was 1268776960 and is now 3047161856.
2025-12-04T12:44:29.1386274Z 
2025-12-04T12:44:29.1386351Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1386620Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda
2025-12-04T12:44:29.1386623Z 
2025-12-04T12:44:29.1386714Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1386717Z 
2025-12-04T12:44:29.1386718Z 
2025-12-04T12:44:29.1386795Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:44:29.1386886Z Process 2 terminated with exit code 10, terminating remaining processes.
2025-12-04T12:44:29.1387157Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-02c126275fdbc82a.xml -
2025-12-04T12:44:29.1387222Z =========================== short test summary info ============================
2025-12-04T12:44:29.1387505Z FAILED [10.0235s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_root_module_is_not_FSDP_cuda - RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T12:44:29.1387555Z Traceback (most recent call last):
2025-12-04T12:44:29.1387719Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1387766Z     getattr(self, test_name)()
2025-12-04T12:44:29.1387927Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1387977Z     fn()
2025-12-04T12:44:29.1388129Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1388173Z     method(*args, **kwargs)
2025-12-04T12:44:29.1388329Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1388381Z     method(*args, **kwargs)
2025-12-04T12:44:29.1388536Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1388575Z     with policy():
2025-12-04T12:44:29.1388729Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1388771Z     raise RuntimeError(msg)
2025-12-04T12:44:29.1389120Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 2. CUDA driver allocated memory was 1268776960 and is now 3047161856.
2025-12-04T12:44:29.1389123Z 
2025-12-04T12:44:29.1389198Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1389456Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda
2025-12-04T12:44:29.1389459Z 
2025-12-04T12:44:29.1389547Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1389726Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:44:29.1389789Z ======================= 1 failed, 7 deselected in 10.03s =======================
2025-12-04T12:44:29.1389831Z Got exit code 1
2025-12-04T12:44:29.1389872Z Retrying single test...
2025-12-04T12:44:29.1390099Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-28db2e4f1bd8cda0.xml
2025-12-04T12:44:29.1390162Z ============================= test session starts ==============================
2025-12-04T12:44:29.1390277Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:44:29.1390323Z cachedir: .pytest_cache
2025-12-04T12:44:29.1390498Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:44:29.1390551Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T12:44:29.1390593Z configfile: pytest.ini
2025-12-04T12:44:29.1390762Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T12:44:29.1390835Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T12:44:29.1391086Z stepcurrent: skipping 7 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_root_module_is_not_FSDP_cuda
2025-12-04T12:44:29.1391131Z Running 1 items in this shard
2025-12-04T12:44:29.1391133Z 
2025-12-04T12:44:29.1391482Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_root_module_is_not_FSDP_cuda I1204 12:44:02.782000 350542 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 350611
2025-12-04T12:44:29.1391640Z I1204 12:44:02.783000 350542 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 350612
2025-12-04T12:44:29.1391797Z I1204 12:44:02.784000 350542 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 350613
2025-12-04T12:44:29.1391948Z I1204 12:44:02.784000 350542 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 350614
2025-12-04T12:44:29.1393030Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:76.)
2025-12-04T12:44:29.1393171Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:44:29.1394237Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:76.)
2025-12-04T12:44:29.1394361Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:44:29.1395424Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:76.)
2025-12-04T12:44:29.1395551Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:44:29.1396618Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:76.)
2025-12-04T12:44:29.1396743Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:44:29.1397457Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1397580Z   prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type(
2025-12-04T12:44:29.1398285Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1398379Z   prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type(
2025-12-04T12:44:29.1399087Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1399178Z   prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type(
2025-12-04T12:44:29.1399944Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1400034Z   prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type(
2025-12-04T12:44:29.1400752Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:829: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1400815Z   FullyShardedDataParallel.set_state_dict_type(
2025-12-04T12:44:29.1401527Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:829: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1401592Z   FullyShardedDataParallel.set_state_dict_type(
2025-12-04T12:44:29.1402290Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:829: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1402367Z   FullyShardedDataParallel.set_state_dict_type(
2025-12-04T12:44:29.1403068Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:829: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1403139Z   FullyShardedDataParallel.set_state_dict_type(
2025-12-04T12:44:29.1403279Z E1204 12:44:11.619000 350613 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.1403436Z E1204 12:44:11.619000 350613 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.1403723Z E1204 12:44:11.619000 350613 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1403875Z E1204 12:44:11.619000 350613 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.1404157Z E1204 12:44:11.619000 350613 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1404276Z E1204 12:44:11.619000 350613 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.1404548Z E1204 12:44:11.619000 350613 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1404693Z E1204 12:44:11.619000 350613 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1404976Z E1204 12:44:11.619000 350613 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1405122Z E1204 12:44:11.619000 350613 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1405390Z E1204 12:44:11.619000 350613 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1405524Z E1204 12:44:11.619000 350613 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.1405796Z E1204 12:44:11.619000 350613 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1405952Z E1204 12:44:11.619000 350613 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.1406422Z E1204 12:44:11.619000 350613 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 2. CUDA driver allocated memory was 1258291200 and is now 3047161856.
2025-12-04T12:44:29.1406532Z E1204 12:44:11.619000 350613 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1406736Z E1204 12:44:11.619000 350613 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1407114Z E1204 12:44:11.619000 350613 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda
2025-12-04T12:44:29.1407239Z E1204 12:44:11.619000 350613 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1407447Z E1204 12:44:11.619000 350613 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1407604Z E1204 12:44:11.619000 350613 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T12:44:29.1407740Z E1204 12:44:11.620000 350612 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.1407897Z E1204 12:44:11.620000 350612 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.1408185Z E1204 12:44:11.620000 350612 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1408334Z E1204 12:44:11.620000 350612 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.1408616Z E1204 12:44:11.620000 350612 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1408732Z E1204 12:44:11.620000 350612 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.1409008Z E1204 12:44:11.620000 350612 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1409153Z E1204 12:44:11.620000 350612 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1409434Z E1204 12:44:11.620000 350612 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1409623Z E1204 12:44:11.620000 350612 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1409894Z E1204 12:44:11.620000 350612 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1410027Z E1204 12:44:11.620000 350612 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.1410312Z E1204 12:44:11.620000 350612 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1410457Z E1204 12:44:11.620000 350612 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.1410925Z E1204 12:44:11.620000 350612 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 1. CUDA driver allocated memory was 1268776960 and is now 3047161856.
2025-12-04T12:44:29.1411051Z E1204 12:44:11.620000 350612 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1411243Z E1204 12:44:11.620000 350612 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1411619Z E1204 12:44:11.620000 350612 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda
2025-12-04T12:44:29.1411743Z E1204 12:44:11.620000 350612 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1411947Z E1204 12:44:11.620000 350612 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1412107Z E1204 12:44:11.620000 350612 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T12:44:29.1412239Z E1204 12:44:11.632000 350611 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.1412397Z E1204 12:44:11.632000 350611 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.1412680Z E1204 12:44:11.632000 350611 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1412827Z E1204 12:44:11.632000 350611 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.1413107Z E1204 12:44:11.632000 350611 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1413223Z E1204 12:44:11.632000 350611 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.1413496Z E1204 12:44:11.632000 350611 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1413652Z E1204 12:44:11.632000 350611 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1413923Z E1204 12:44:11.632000 350611 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1414067Z E1204 12:44:11.632000 350611 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1414335Z E1204 12:44:11.632000 350611 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1414467Z E1204 12:44:11.632000 350611 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.1414759Z E1204 12:44:11.632000 350611 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1414904Z E1204 12:44:11.632000 350611 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.1415373Z E1204 12:44:11.632000 350611 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 0. CUDA driver allocated memory was 1438646272 and is now 3206545408.
2025-12-04T12:44:29.1415496Z E1204 12:44:11.632000 350611 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1415689Z E1204 12:44:11.632000 350611 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1416075Z E1204 12:44:11.632000 350611 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda
2025-12-04T12:44:29.1416184Z E1204 12:44:11.632000 350611 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1416387Z E1204 12:44:11.632000 350611 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1416551Z E1204 12:44:11.632000 350611 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T12:44:29.1416681Z E1204 12:44:11.634000 350614 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.1416836Z E1204 12:44:11.634000 350614 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.1417115Z E1204 12:44:11.634000 350614 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1417263Z E1204 12:44:11.634000 350614 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.1417544Z E1204 12:44:11.634000 350614 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1417659Z E1204 12:44:11.634000 350614 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.1417931Z E1204 12:44:11.634000 350614 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1418085Z E1204 12:44:11.634000 350614 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1418357Z E1204 12:44:11.634000 350614 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1418498Z E1204 12:44:11.634000 350614 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1418769Z E1204 12:44:11.634000 350614 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1418912Z E1204 12:44:11.634000 350614 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.1419183Z E1204 12:44:11.634000 350614 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1419325Z E1204 12:44:11.634000 350614 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.1419827Z E1204 12:44:11.634000 350614 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 3. CUDA driver allocated memory was 1268776960 and is now 3047161856.
2025-12-04T12:44:29.1419958Z E1204 12:44:11.634000 350614 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1420149Z E1204 12:44:11.634000 350614 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1420539Z E1204 12:44:11.634000 350614 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda
2025-12-04T12:44:29.1420650Z E1204 12:44:11.634000 350614 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1420854Z E1204 12:44:11.634000 350614 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1421015Z E1204 12:44:11.634000 350614 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T12:44:29.1421056Z FAILED [10.3199s] [100%]
2025-12-04T12:44:29.1421058Z 
2025-12-04T12:44:29.1421121Z =================================== FAILURES ===================================
2025-12-04T12:44:29.1421232Z ____ TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda ____
2025-12-04T12:44:29.1421285Z Traceback (most recent call last):
2025-12-04T12:44:29.1421451Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T12:44:29.1421500Z     self._join_processes(fn)
2025-12-04T12:44:29.1421673Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T12:44:29.1421732Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T12:44:29.1421911Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T12:44:29.1421959Z     raise RuntimeError(error)
2025-12-04T12:44:29.1422040Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:44:29.1422091Z Traceback (most recent call last):
2025-12-04T12:44:29.1422266Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1422313Z     getattr(self, test_name)()
2025-12-04T12:44:29.1422475Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1422511Z     fn()
2025-12-04T12:44:29.1422667Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1422709Z     method(*args, **kwargs)
2025-12-04T12:44:29.1422867Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1422909Z     method(*args, **kwargs)
2025-12-04T12:44:29.1423081Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1423121Z     with policy():
2025-12-04T12:44:29.1423280Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1423323Z     raise RuntimeError(msg)
2025-12-04T12:44:29.1423679Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 0. CUDA driver allocated memory was 1438646272 and is now 3206545408.
2025-12-04T12:44:29.1423694Z 
2025-12-04T12:44:29.1423770Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1424026Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda
2025-12-04T12:44:29.1424039Z 
2025-12-04T12:44:29.1424128Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1424134Z 
2025-12-04T12:44:29.1424196Z Process 2 exited with error code 10 and exception:
2025-12-04T12:44:29.1424246Z Traceback (most recent call last):
2025-12-04T12:44:29.1424409Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1424457Z     getattr(self, test_name)()
2025-12-04T12:44:29.1424617Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1424657Z     fn()
2025-12-04T12:44:29.1424809Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1424854Z     method(*args, **kwargs)
2025-12-04T12:44:29.1425005Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1425050Z     method(*args, **kwargs)
2025-12-04T12:44:29.1425202Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1425245Z     with policy():
2025-12-04T12:44:29.1425399Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1425443Z     raise RuntimeError(msg)
2025-12-04T12:44:29.1425790Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 2. CUDA driver allocated memory was 1258291200 and is now 3047161856.
2025-12-04T12:44:29.1425794Z 
2025-12-04T12:44:29.1425873Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1426140Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda
2025-12-04T12:44:29.1426148Z 
2025-12-04T12:44:29.1426235Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1426237Z 
2025-12-04T12:44:29.1426239Z 
2025-12-04T12:44:29.1426319Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:44:29.1426406Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T12:44:29.1426684Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-28db2e4f1bd8cda0.xml -
2025-12-04T12:44:29.1426747Z =========================== short test summary info ============================
2025-12-04T12:44:29.1427031Z FAILED [10.3199s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_root_module_is_not_FSDP_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T12:44:29.1427080Z Traceback (most recent call last):
2025-12-04T12:44:29.1427248Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1427291Z     getattr(self, test_name)()
2025-12-04T12:44:29.1427454Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1427490Z     fn()
2025-12-04T12:44:29.1427644Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1427701Z     method(*args, **kwargs)
2025-12-04T12:44:29.1427852Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1427894Z     method(*args, **kwargs)
2025-12-04T12:44:29.1428056Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1428098Z     with policy():
2025-12-04T12:44:29.1428252Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1428297Z     raise RuntimeError(msg)
2025-12-04T12:44:29.1428647Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 0. CUDA driver allocated memory was 1438646272 and is now 3206545408.
2025-12-04T12:44:29.1428650Z 
2025-12-04T12:44:29.1428729Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1428983Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda
2025-12-04T12:44:29.1428986Z 
2025-12-04T12:44:29.1429078Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1429080Z 
2025-12-04T12:44:29.1429140Z Process 2 exited with error code 10 and exception:
2025-12-04T12:44:29.1429189Z Traceback (most recent call last):
2025-12-04T12:44:29.1429353Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1429398Z     getattr(self, test_name)()
2025-12-04T12:44:29.1429559Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1429723Z     fn()
2025-12-04T12:44:29.1429878Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1429918Z     method(*args, **kwargs)
2025-12-04T12:44:29.1430074Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1430141Z     method(*args, **kwargs)
2025-12-04T12:44:29.1430296Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1430333Z     with policy():
2025-12-04T12:44:29.1430488Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1430530Z     raise RuntimeError(msg)
2025-12-04T12:44:29.1430879Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 2. CUDA driver allocated memory was 1258291200 and is now 3047161856.
2025-12-04T12:44:29.1430882Z 
2025-12-04T12:44:29.1430956Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1431225Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda
2025-12-04T12:44:29.1431227Z 
2025-12-04T12:44:29.1431319Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1431384Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:44:29.1431451Z ======================= 1 failed, 7 deselected in 10.33s =======================
2025-12-04T12:44:29.1431489Z Got exit code 1
2025-12-04T12:44:29.1431533Z Retrying single test...
2025-12-04T12:44:29.1431776Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-8c7b7a26ccdec75d.xml
2025-12-04T12:44:29.1431839Z ============================= test session starts ==============================
2025-12-04T12:44:29.1431955Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:44:29.1432013Z cachedir: .pytest_cache
2025-12-04T12:44:29.1432171Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:44:29.1432223Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T12:44:29.1432266Z configfile: pytest.ini
2025-12-04T12:44:29.1432434Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T12:44:29.1432507Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T12:44:29.1432757Z stepcurrent: skipping 7 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_root_module_is_not_FSDP_cuda
2025-12-04T12:44:29.1432802Z Running 1 items in this shard
2025-12-04T12:44:29.1432804Z 
2025-12-04T12:44:29.1433137Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_root_module_is_not_FSDP_cuda I1204 12:44:15.600000 351012 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 351081
2025-12-04T12:44:29.1433294Z I1204 12:44:15.601000 351012 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 351082
2025-12-04T12:44:29.1433450Z I1204 12:44:15.601000 351012 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 351083
2025-12-04T12:44:29.1433603Z I1204 12:44:15.602000 351012 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 351084
2025-12-04T12:44:29.1434694Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:76.)
2025-12-04T12:44:29.1434823Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:44:29.1435898Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:76.)
2025-12-04T12:44:29.1436022Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:44:29.1437084Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:76.)
2025-12-04T12:44:29.1437230Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:44:29.1438292Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:76.)
2025-12-04T12:44:29.1438413Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T12:44:29.1439128Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1439234Z   prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type(
2025-12-04T12:44:29.1439983Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1440080Z   prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type(
2025-12-04T12:44:29.1440788Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1440883Z   prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type(
2025-12-04T12:44:29.1441580Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1441697Z   prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type(
2025-12-04T12:44:29.1442400Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:829: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1442467Z   FullyShardedDataParallel.set_state_dict_type(
2025-12-04T12:44:29.1443162Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:829: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1443226Z   FullyShardedDataParallel.set_state_dict_type(
2025-12-04T12:44:29.1443919Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:829: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1443982Z   FullyShardedDataParallel.set_state_dict_type(
2025-12-04T12:44:29.1444697Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:829: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
2025-12-04T12:44:29.1444756Z   FullyShardedDataParallel.set_state_dict_type(
2025-12-04T12:44:29.1444894Z E1204 12:44:24.432000 351083 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.1445049Z E1204 12:44:24.432000 351083 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.1445345Z E1204 12:44:24.432000 351083 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1445494Z E1204 12:44:24.432000 351083 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.1445779Z E1204 12:44:24.432000 351083 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1445898Z E1204 12:44:24.432000 351083 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.1446180Z E1204 12:44:24.432000 351083 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1446325Z E1204 12:44:24.432000 351083 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1446611Z E1204 12:44:24.432000 351083 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1446753Z E1204 12:44:24.432000 351083 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1447020Z E1204 12:44:24.432000 351083 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1447151Z E1204 12:44:24.432000 351083 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.1447425Z E1204 12:44:24.432000 351083 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1447567Z E1204 12:44:24.432000 351083 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.1448043Z E1204 12:44:24.432000 351083 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 2. CUDA driver allocated memory was 1268776960 and is now 3047161856.
2025-12-04T12:44:29.1448155Z E1204 12:44:24.432000 351083 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1448348Z E1204 12:44:24.432000 351083 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1448735Z E1204 12:44:24.432000 351083 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda
2025-12-04T12:44:29.1448845Z E1204 12:44:24.432000 351083 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1449053Z E1204 12:44:24.432000 351083 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1449211Z E1204 12:44:24.432000 351083 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T12:44:29.1449344Z E1204 12:44:24.465000 351081 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.1449497Z E1204 12:44:24.465000 351081 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.1449831Z E1204 12:44:24.465000 351081 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1449978Z E1204 12:44:24.465000 351081 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.1450257Z E1204 12:44:24.465000 351081 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1450389Z E1204 12:44:24.465000 351081 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.1450659Z E1204 12:44:24.465000 351081 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1450818Z E1204 12:44:24.465000 351081 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1451088Z E1204 12:44:24.465000 351081 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1451230Z E1204 12:44:24.465000 351081 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1451498Z E1204 12:44:24.465000 351081 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1451630Z E1204 12:44:24.465000 351081 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.1451902Z E1204 12:44:24.465000 351081 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1452045Z E1204 12:44:24.465000 351081 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.1452521Z E1204 12:44:24.465000 351081 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 0. CUDA driver allocated memory was 1438646272 and is now 3210739712.
2025-12-04T12:44:29.1452629Z E1204 12:44:24.465000 351081 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1452820Z E1204 12:44:24.465000 351081 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1453207Z E1204 12:44:24.465000 351081 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda
2025-12-04T12:44:29.1453319Z E1204 12:44:24.465000 351081 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1453523Z E1204 12:44:24.465000 351081 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1453685Z E1204 12:44:24.465000 351081 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T12:44:29.1453815Z E1204 12:44:24.511000 351084 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.1453977Z E1204 12:44:24.511000 351084 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.1454258Z E1204 12:44:24.511000 351084 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1454404Z E1204 12:44:24.511000 351084 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.1454683Z E1204 12:44:24.511000 351084 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1454808Z E1204 12:44:24.511000 351084 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.1455079Z E1204 12:44:24.511000 351084 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1455230Z E1204 12:44:24.511000 351084 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1455498Z E1204 12:44:24.511000 351084 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1455639Z E1204 12:44:24.511000 351084 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1455908Z E1204 12:44:24.511000 351084 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1456039Z E1204 12:44:24.511000 351084 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.1456314Z E1204 12:44:24.511000 351084 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1456458Z E1204 12:44:24.511000 351084 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.1456920Z E1204 12:44:24.511000 351084 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 3. CUDA driver allocated memory was 1260388352 and is now 3047161856.
2025-12-04T12:44:29.1457030Z E1204 12:44:24.511000 351084 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1457222Z E1204 12:44:24.511000 351084 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1457606Z E1204 12:44:24.511000 351084 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda
2025-12-04T12:44:29.1457715Z E1204 12:44:24.511000 351084 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1457917Z E1204 12:44:24.511000 351084 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1458077Z E1204 12:44:24.511000 351084 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T12:44:29.1458218Z E1204 12:44:24.512000 351082 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T12:44:29.1458376Z E1204 12:44:24.512000 351082 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T12:44:29.1458660Z E1204 12:44:24.512000 351082 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1458806Z E1204 12:44:24.512000 351082 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T12:44:29.1459084Z E1204 12:44:24.512000 351082 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1459212Z E1204 12:44:24.512000 351082 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T12:44:29.1459485Z E1204 12:44:24.512000 351082 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1459679Z E1204 12:44:24.512000 351082 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1459949Z E1204 12:44:24.512000 351082 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1460090Z E1204 12:44:24.512000 351082 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T12:44:29.1460360Z E1204 12:44:24.512000 351082 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1460491Z E1204 12:44:24.512000 351082 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T12:44:29.1460764Z E1204 12:44:24.512000 351082 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1460906Z E1204 12:44:24.512000 351082 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T12:44:29.1461373Z E1204 12:44:24.512000 351082 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 1. CUDA driver allocated memory was 1268776960 and is now 3047161856.
2025-12-04T12:44:29.1461485Z E1204 12:44:24.512000 351082 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1461697Z E1204 12:44:24.512000 351082 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1462068Z E1204 12:44:24.512000 351082 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda
2025-12-04T12:44:29.1462178Z E1204 12:44:24.512000 351082 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T12:44:29.1462382Z E1204 12:44:24.512000 351082 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1462540Z E1204 12:44:24.512000 351082 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T12:44:29.1462596Z FAILED [10.1208s] [100%]
2025-12-04T12:44:29.1462598Z 
2025-12-04T12:44:29.1462660Z =================================== FAILURES ===================================
2025-12-04T12:44:29.1462768Z ____ TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda ____
2025-12-04T12:44:29.1462818Z Traceback (most recent call last):
2025-12-04T12:44:29.1462981Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T12:44:29.1463028Z     self._join_processes(fn)
2025-12-04T12:44:29.1463201Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T12:44:29.1463276Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T12:44:29.1463460Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T12:44:29.1463505Z     raise RuntimeError(error)
2025-12-04T12:44:29.1463599Z RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T12:44:29.1463645Z Traceback (most recent call last):
2025-12-04T12:44:29.1463815Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1463857Z     getattr(self, test_name)()
2025-12-04T12:44:29.1464021Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1464057Z     fn()
2025-12-04T12:44:29.1464211Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1464252Z     method(*args, **kwargs)
2025-12-04T12:44:29.1464406Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1464447Z     method(*args, **kwargs)
2025-12-04T12:44:29.1464602Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1464640Z     with policy():
2025-12-04T12:44:29.1464795Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1464836Z     raise RuntimeError(msg)
2025-12-04T12:44:29.1465185Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 2. CUDA driver allocated memory was 1268776960 and is now 3047161856.
2025-12-04T12:44:29.1465188Z 
2025-12-04T12:44:29.1465262Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1465519Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda
2025-12-04T12:44:29.1465522Z 
2025-12-04T12:44:29.1465625Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1465627Z 
2025-12-04T12:44:29.1465629Z 
2025-12-04T12:44:29.1465705Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:44:29.1465794Z Process 2 terminated with exit code 10, terminating remaining processes.
2025-12-04T12:44:29.1466065Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-8c7b7a26ccdec75d.xml -
2025-12-04T12:44:29.1466131Z =========================== short test summary info ============================
2025-12-04T12:44:29.1466400Z FAILED [10.1208s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_root_module_is_not_FSDP_cuda - RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T12:44:29.1466461Z Traceback (most recent call last):
2025-12-04T12:44:29.1466628Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T12:44:29.1466674Z     getattr(self, test_name)()
2025-12-04T12:44:29.1466834Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T12:44:29.1466872Z     fn()
2025-12-04T12:44:29.1467023Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1467076Z     method(*args, **kwargs)
2025-12-04T12:44:29.1467230Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:44:29.1467269Z     method(*args, **kwargs)
2025-12-04T12:44:29.1467423Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:44:29.1467472Z     with policy():
2025-12-04T12:44:29.1467628Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:44:29.1467669Z     raise RuntimeError(msg)
2025-12-04T12:44:29.1468017Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 2. CUDA driver allocated memory was 1268776960 and is now 3047161856.
2025-12-04T12:44:29.1468019Z 
2025-12-04T12:44:29.1468094Z To execute this test, run the following from the base repo dir:
2025-12-04T12:44:29.1468349Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda
2025-12-04T12:44:29.1468351Z 
2025-12-04T12:44:29.1468440Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:44:29.1468508Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:44:29.1468570Z ======================= 1 failed, 7 deselected in 10.13s =======================
2025-12-04T12:44:29.1468609Z Got exit code 1
2025-12-04T12:44:29.1468816Z FAILED CONSISTENTLY: test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_root_module_is_not_FSDP_cuda
2025-12-04T12:44:29.1468946Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:44:29.1469174Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-6462e47ce8fb2aa4.xml
2025-12-04T12:44:29.1469232Z ============================= test session starts ==============================
2025-12-04T12:44:29.1469349Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:44:29.1469391Z cachedir: .pytest_cache
2025-12-04T12:44:29.1469564Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:44:29.1469650Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T12:44:29.1469694Z configfile: pytest.ini
2025-12-04T12:44:29.1469856Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T12:44:29.1469930Z collecting ... collected 8 items / 8 deselected / 0 selected
2025-12-04T12:44:29.1469983Z stepcurrent: skipping 8 already run items.
2025-12-04T12:44:29.1470029Z Running 0 items in this shard
2025-12-04T12:44:29.1470031Z 
2025-12-04T12:44:29.1470300Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-6462e47ce8fb2aa4.xml -
2025-12-04T12:44:29.1470381Z ============================ 8 deselected in 0.00s =============================
2025-12-04T12:44:29.1472150Z The following tests failed consistently: ['test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda', 'test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda', 'test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda', 'test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda', 'test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda', 'test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda', 'test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_hsdp_init_with_device_mesh_cuda', 'test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_root_module_is_not_FSDP_cuda']
2025-12-04T12:44:29.1472180Z 
2025-12-04T12:44:29.1472404Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_hsdp_dtensor_state_dict 1/1 (test/test-reports/distributed.fsdp.test_hsdp_dtensor_state_dict_1.1_60de516b7e1e2204_.log)
2025-12-04T12:44:29.1472407Z 
2025-12-04T12:44:29.1472549Z Finished distributed/fsdp/test_hsdp_dtensor_state_dict 1/1 ... [2025-12-04 12:44:29.028174][5230310.007213409], took 5.24min
2025-12-04T12:44:29.1472820Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T12:44:29.1472908Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T12:44:29.1473007Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading
2025-12-04T12:44:29.1473055Z Uploading artifacts took 0.00 seconds
2025-12-04T12:44:29.1473129Z distributed/fsdp/test_hsdp_dtensor_state_dict 1/1 failed!
2025-12-04T12:44:29.1473247Z Running distributed/fsdp/test_fsdp_hybrid_shard 1/1 ... [2025-12-04 12:44:29.031021][5230310.010063092]
2025-12-04T12:44:29.1473299Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T12:44:29.1473625Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/fsdp/test_fsdp_hybrid_shard.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:44:29.031206]
2025-12-04T12:45:30.0435109Z 
2025-12-04T12:45:30.0436233Z distributed/fsdp/test_fsdp_hybrid_shard 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.fsdp.test_fsdp_hybrid_shard_1.1_e89f2503325d3e91_.log
2025-12-04T12:45:30.0438965Z Running 6 items in this shard: test/distributed/fsdp/test_fsdp_hybrid_shard.py::TestFSDPHybridShard::test_fsdp_hybrid_shard_basic_setup, test/distributed/fsdp/test_fsdp_hybrid_shard.py::TestFSDPHybridShard::test_fsdp_hybrid_shard_parity, test/distributed/fsdp/test_fsdp_hybrid_shard.py::TestFSDPHybridShard::test_hsdp_save_load_state_dict, test/distributed/fsdp/test_fsdp_hybrid_shard.py::TestFSDPHybridShard::test_hsdp_sync_module_state, test/distributed/fsdp/test_fsdp_hybrid_shard.py::TestFSDPHybridShard::test_invalid_pg_specification_raises, test/distributed/fsdp/test_fsdp_hybrid_shard.py::TestFSDPHybridShard::test_raises_manual_wrap_hybrid_shard_when_none_policy
2025-12-04T12:45:30.0440743Z 
2025-12-04T12:45:30.0440972Z Finished distributed/fsdp/test_fsdp_hybrid_shard 1/1 ... [2025-12-04 12:45:30.043272][5230371.02231184], took 1.02min
2025-12-04T12:45:30.0449716Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T12:45:30.0459107Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T12:45:30.0461633Z Running distributed/_composable/fsdp/test_fully_shard_training 1/1 ... [2025-12-04 12:45:30.046068][5230371.025108924]
2025-12-04T12:45:30.0461962Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T12:45:30.0463664Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_composable/fsdp/test_fully_shard_training.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:45:30.046260]
2025-12-04T12:54:16.2445970Z 
2025-12-04T12:54:16.2447190Z distributed/_composable/fsdp/test_fully_shard_training 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._composable.fsdp.test_fully_shard_training_1.1_30a13ba1cb8fc7b7_.log
2025-12-04T12:54:16.2497292Z Running 25 items in this shard: test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShardForwardInputs::test_root_move_forward_input_to_device, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShardRegisteredParams::test_param_registration_after_backward, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShardRegisteredParams::test_param_registration_after_forward, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShardCastAfterInit::test_to_float64_after_init, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShard1DTrainingCore::test_explicit_prefetching, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShard1DTrainingCore::test_multi_forward_module, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShard1DTrainingCore::test_non_root_forward_backward, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShard1DTrainingCore::test_post_optim_event, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShard1DTrainingCore::test_train_parity_multi_group, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShard1DTrainingCore::test_train_parity_multi_group_cpu_offload_eager, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShard1DTrainingCore::test_train_parity_multi_group_unshard_async_op, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShard1DTrainingCore::test_train_parity_single_group_shard_dim0, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShard1DTrainingCore::test_train_parity_single_group_shard_largest_dim, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShard1DTrainingCompose::test_train_parity_with_activation_checkpointing, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShardShardPlacementFnMultiProcess::test_train_parity_shard_placement_fn_shard_largest_dim, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShardShardPlacementFnMultiThread::test_shard_placement_fn_contiguous_params_grads, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShardSharedParams::test_train_parity_with_shared_params, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShardGradientAccumulation::test_1f1b_microbatching, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShardGradientAccumulation::test_gradient_accumulation, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShardNDTraining::test_2d_mlp_with_nd_mesh, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShardHSDP3DTraining::test_3d_mlp_with_nd_mesh, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShardHSDPTraining::test_train_parity_hsdp, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShardCustomForwardMethod::test_register_fsdp_forward_method, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShardShareCommContext::test_share_comm_context, test/distributed/_composable/fsdp/test_fully_shard_training.py::TestFullyShardWorldSize1::test_train_parity_single_worldsize1
2025-12-04T12:54:16.2719746Z 
2025-12-04T12:54:16.2721763Z Finished distributed/_composable/fsdp/test_fully_shard_training 1/1 ... [2025-12-04 12:54:16.271871][5230897.250900075], took 8.77min
2025-12-04T12:54:16.2736913Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T12:54:16.2747282Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T12:54:16.2749159Z Running distributed/fsdp/test_fsdp_multiple_forward 1/1 ... [2025-12-04 12:54:16.274822][5230897.25386105]
2025-12-04T12:54:16.2749622Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T12:54:16.2752210Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/fsdp/test_fsdp_multiple_forward.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:54:16.275081]
2025-12-04T12:54:18.3477587Z 
2025-12-04T12:54:18.3478650Z distributed/fsdp/test_fsdp_multiple_forward 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.fsdp.test_fsdp_multiple_forward_1.1_acc0220e24e9c56a_.log
2025-12-04T12:54:18.3479336Z Running 0 items in this shard:
2025-12-04T12:54:18.3479465Z 
2025-12-04T12:54:18.3479782Z Finished distributed/fsdp/test_fsdp_multiple_forward 1/1 ... [2025-12-04 12:54:18.347491][5230899.326526629], took 0.03min
2025-12-04T12:54:18.3500576Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T12:54:18.3508989Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T12:54:18.3511419Z Running distributed/checkpoint/test_state_dict 1/1 ... [2025-12-04 12:54:18.351039][5230899.330080367]
2025-12-04T12:54:18.3511782Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T12:54:18.3513878Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/checkpoint/test_state_dict.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:54:18.351262]
2025-12-04T12:56:53.7036885Z 
2025-12-04T12:56:53.7037563Z distributed/checkpoint/test_state_dict 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint.test_state_dict_1.1_ac0a269fa24e4fe1_.log
2025-12-04T12:56:53.7046613Z Running 25 items in this shard: test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_activation_ckpt_fqns_ddp, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_activation_ckpt_fqns_fsdp1, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_broadcast_from_rank0, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_compiled_fsdp, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_cpu_offload_full_state_dict, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_ddp, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_deprecate_api, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_extra_state, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_flattened_osd, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_fsdp, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_fsdp2, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_fsdp_ddp, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_fsdp_root_not_initialized, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_multi_device_load_model_state_dict, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_multi_param_groups, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_non_persistent_buffers, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_optim_state_dict_param_matching, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_set_cpu_model_state_dict_broadcast_from_rank0, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_setting_meta_device_model, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_setting_meta_device_model_broadcasting_and_memory, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_shared_weight, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_single_gpu, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_state_dict_with_hook_on_keys, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_strict, test/distributed/checkpoint/test_state_dict.py::TestNoComm::test_no_dist
2025-12-04T12:56:53.7052294Z 
2025-12-04T12:56:53.7052515Z Finished distributed/checkpoint/test_state_dict 1/1 ... [2025-12-04 12:56:53.705001][5231054.684038806], took 2.59min
2025-12-04T12:56:53.7069530Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T12:56:53.7078101Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T12:56:53.7080554Z Running distributed/fsdp/test_fsdp_core 1/2 ... [2025-12-04 12:56:53.707951][5231054.686992511]
2025-12-04T12:56:53.7080928Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T12:56:53.7083049Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/fsdp/test_fsdp_core.py', '--shard-id=1', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:56:53.708170]
2025-12-04T13:38:31.9224615Z 
2025-12-04T13:38:31.9227424Z PRINTING LOG FILE of distributed/fsdp/test_fsdp_core 1/2 (test/test-reports/distributed.fsdp.test_fsdp_core_1.2_d5d5bc8f8345486d_.log)
2025-12-04T13:38:31.9227931Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-bfe1494716c9ba3f.xml
2025-12-04T13:38:31.9228249Z ============================= test session starts ==============================
2025-12-04T13:38:31.9228505Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:31.9228702Z cachedir: .pytest_cache
2025-12-04T13:38:31.9228929Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:31.9229181Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:31.9229313Z configfile: pytest.ini
2025-12-04T13:38:31.9229962Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:31.9230217Z collecting ... collected 60 items
2025-12-04T13:38:31.9230361Z stepcurrent: Cannot find last run test, not skipping
2025-12-04T13:38:31.9236115Z Running 33 items in this shard: test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_cuda_first_True_cuda, test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_False_mixed_precision_False_cuda, test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_False_mixed_precision_True_cuda, test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_False_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_no_shard_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_no_shard_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_shard_grad_op_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_no_shard_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_none_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_no_shard_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_none_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_no_shard_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_no_shard_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_none_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_shard_grad_op_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_no_shard_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_shard_grad_op_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_shard_grad_op_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_shard_grad_op_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_no_shard_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_none_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_shard_grad_op_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_none_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_shard_grad_op_cuda, test/distributed/fsdp/test_fsdp_core.py::TestNoGradCUDA::test_transformer_no_grad_mixed_precision_True_cuda
2025-12-04T13:38:31.9242080Z 
2025-12-04T13:38:31.9242396Z distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_cuda_first_True_cuda I1204 12:56:55.485000 377415 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 377484
2025-12-04T13:38:31.9242908Z I1204 12:56:55.486000 377415 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 377485
2025-12-04T13:38:31.9243338Z I1204 12:56:55.486000 377415 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 377486
2025-12-04T13:38:31.9243682Z I1204 12:56:55.487000 377415 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 377487
2025-12-04T13:38:31.9244241Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:31.9244690Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:31.9245275Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:31.9245907Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:31.9246359Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:31.9247096Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:31.9247532Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:31.9247972Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:31.9248567Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:31.9249231Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:31.9249826Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:31.9250337Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:31.9250914Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:31.9251660Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:31.9252241Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:31.9252826Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:31.9253155Z [rank2]:E1204 12:57:02.822000 377486 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:31.9253496Z [rank2]:E1204 12:57:02.822000 377486 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:31.9254008Z [rank2]:E1204 12:57:02.822000 377486 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9254489Z [rank2]:E1204 12:57:02.822000 377486 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:31.9254973Z [rank2]:E1204 12:57:02.822000 377486 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9255440Z [rank2]:E1204 12:57:02.822000 377486 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:31.9255923Z [rank2]:E1204 12:57:02.822000 377486 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9256470Z [rank2]:E1204 12:57:02.822000 377486 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9256933Z [rank2]:E1204 12:57:02.822000 377486 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9257401Z [rank2]:E1204 12:57:02.822000 377486 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9257867Z [rank2]:E1204 12:57:02.822000 377486 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9258318Z [rank2]:E1204 12:57:02.822000 377486 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:31.9258774Z [rank2]:E1204 12:57:02.822000 377486 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9259240Z [rank2]:E1204 12:57:02.822000 377486 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:31.9259978Z [rank2]:E1204 12:57:02.822000 377486 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 2. CUDA driver allocated memory was 2300575744 and is now 3380609024.
2025-12-04T13:38:31.9260609Z [rank2]:E1204 12:57:02.822000 377486 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9260960Z [rank2]:E1204 12:57:02.822000 377486 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9261579Z [rank2]:E1204 12:57:02.822000 377486 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda
2025-12-04T13:38:31.9262080Z [rank2]:E1204 12:57:02.822000 377486 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9262446Z [rank2]:E1204 12:57:02.822000 377486 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9262863Z [rank2]:E1204 12:57:02.822000 377486 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:31.9263106Z dist init r=2, world=4
2025-12-04T13:38:31.9263325Z [rank3]:E1204 12:57:02.827000 377487 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:31.9263668Z [rank3]:E1204 12:57:02.827000 377487 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:31.9264168Z [rank3]:E1204 12:57:02.827000 377487 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9264665Z [rank3]:E1204 12:57:02.827000 377487 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:31.9265168Z [rank3]:E1204 12:57:02.827000 377487 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9265660Z [rank3]:E1204 12:57:02.827000 377487 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:31.9266140Z [rank3]:E1204 12:57:02.827000 377487 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9266605Z [rank3]:E1204 12:57:02.827000 377487 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9267072Z [rank3]:E1204 12:57:02.827000 377487 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9267541Z [rank3]:E1204 12:57:02.827000 377487 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9268013Z [rank3]:E1204 12:57:02.827000 377487 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9268515Z [rank3]:E1204 12:57:02.827000 377487 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:31.9269017Z [rank3]:E1204 12:57:02.827000 377487 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9269516Z [rank3]:E1204 12:57:02.827000 377487 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:31.9270228Z [rank3]:E1204 12:57:02.827000 377487 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 3. CUDA driver allocated memory was 2250244096 and is now 3330277376.
2025-12-04T13:38:31.9270988Z [rank3]:E1204 12:57:02.827000 377487 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9271354Z [rank3]:E1204 12:57:02.827000 377487 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9272019Z [rank3]:E1204 12:57:02.827000 377487 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda
2025-12-04T13:38:31.9272523Z [rank3]:E1204 12:57:02.827000 377487 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9272908Z [rank3]:E1204 12:57:02.827000 377487 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9273357Z [rank3]:E1204 12:57:02.827000 377487 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:31.9273633Z dist init r=3, world=4
2025-12-04T13:38:31.9273839Z [rank1]:E1204 12:57:02.828000 377485 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:31.9274190Z [rank1]:E1204 12:57:02.828000 377485 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:31.9274718Z [rank1]:E1204 12:57:02.828000 377485 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9275231Z [rank1]:E1204 12:57:02.828000 377485 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:31.9275755Z [rank1]:E1204 12:57:02.828000 377485 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9276200Z [rank1]:E1204 12:57:02.828000 377485 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:31.9276642Z [rank1]:E1204 12:57:02.828000 377485 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9277108Z [rank1]:E1204 12:57:02.828000 377485 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9277571Z [rank1]:E1204 12:57:02.828000 377485 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9278057Z [rank1]:E1204 12:57:02.828000 377485 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9278531Z [rank1]:E1204 12:57:02.828000 377485 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9278982Z [rank1]:E1204 12:57:02.828000 377485 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:31.9279442Z [rank1]:E1204 12:57:02.828000 377485 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9279966Z [rank1]:E1204 12:57:02.828000 377485 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:31.9280686Z [rank1]:E1204 12:57:02.828000 377485 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 1. CUDA driver allocated memory was 2317352960 and is now 3397386240.
2025-12-04T13:38:31.9281329Z [rank1]:E1204 12:57:02.828000 377485 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9281675Z [rank1]:E1204 12:57:02.828000 377485 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9282281Z [rank1]:E1204 12:57:02.828000 377485 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda
2025-12-04T13:38:31.9282788Z [rank1]:E1204 12:57:02.828000 377485 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9283159Z [rank1]:E1204 12:57:02.828000 377485 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9283569Z [rank1]:E1204 12:57:02.828000 377485 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:31.9283808Z dist init r=1, world=4
2025-12-04T13:38:31.9284007Z [rank0]:E1204 12:57:02.880000 377484 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:31.9284365Z [rank0]:E1204 12:57:02.880000 377484 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:31.9284855Z [rank0]:E1204 12:57:02.880000 377484 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9285347Z [rank0]:E1204 12:57:02.880000 377484 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:31.9285826Z [rank0]:E1204 12:57:02.880000 377484 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9286277Z [rank0]:E1204 12:57:02.880000 377484 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:31.9286717Z [rank0]:E1204 12:57:02.880000 377484 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9287181Z [rank0]:E1204 12:57:02.880000 377484 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9287643Z [rank0]:E1204 12:57:02.880000 377484 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9288156Z [rank0]:E1204 12:57:02.880000 377484 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9288657Z [rank0]:E1204 12:57:02.880000 377484 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9289109Z [rank0]:E1204 12:57:02.880000 377484 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:31.9289664Z [rank0]:E1204 12:57:02.880000 377484 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9290131Z [rank0]:E1204 12:57:02.880000 377484 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:31.9290791Z [rank0]:E1204 12:57:02.880000 377484 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 2453667840 and is now 3533701120.
2025-12-04T13:38:31.9291410Z [rank0]:E1204 12:57:02.880000 377484 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9291798Z [rank0]:E1204 12:57:02.880000 377484 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9292381Z [rank0]:E1204 12:57:02.880000 377484 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda
2025-12-04T13:38:31.9292878Z [rank0]:E1204 12:57:02.880000 377484 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9293245Z [rank0]:E1204 12:57:02.880000 377484 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9293690Z [rank0]:E1204 12:57:02.880000 377484 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:31.9293929Z dist init r=0, world=4
2025-12-04T13:38:31.9294347Z [rank0]:[W1204 12:57:03.136665255 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:31.9295003Z FAILED [9.3193s] [  3%]
2025-12-04T13:38:31.9295079Z 
2025-12-04T13:38:31.9295162Z =================================== FAILURES ===================================
2025-12-04T13:38:31.9295415Z ____ TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda ____
2025-12-04T13:38:31.9295609Z Traceback (most recent call last):
2025-12-04T13:38:31.9295887Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:31.9296145Z     self._join_processes(fn)
2025-12-04T13:38:31.9296407Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:31.9296677Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:31.9296956Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:31.9297224Z     raise RuntimeError(error)
2025-12-04T13:38:31.9297386Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T13:38:31.9297555Z Traceback (most recent call last):
2025-12-04T13:38:31.9297804Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9298064Z     getattr(self, test_name)()
2025-12-04T13:38:31.9298309Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9298552Z     fn()
2025-12-04T13:38:31.9298783Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9299245Z     method(*args, **kwargs)
2025-12-04T13:38:31.9299666Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9336557Z     method(*args, **kwargs)
2025-12-04T13:38:31.9336865Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9337105Z     with policy():
2025-12-04T13:38:31.9337333Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9337578Z     raise RuntimeError(msg)
2025-12-04T13:38:31.9338005Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 1. CUDA driver allocated memory was 2317352960 and is now 3397386240.
2025-12-04T13:38:31.9338447Z 
2025-12-04T13:38:31.9338583Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9339027Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda
2025-12-04T13:38:31.9339340Z 
2025-12-04T13:38:31.9339438Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9339629Z 
2025-12-04T13:38:31.9339699Z Process 2 exited with error code 10 and exception:
2025-12-04T13:38:31.9339893Z Traceback (most recent call last):
2025-12-04T13:38:31.9340211Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9340553Z     getattr(self, test_name)()
2025-12-04T13:38:31.9340855Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9341110Z     fn()
2025-12-04T13:38:31.9341328Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9341616Z     method(*args, **kwargs)
2025-12-04T13:38:31.9341884Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9342198Z     method(*args, **kwargs)
2025-12-04T13:38:31.9342432Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9342678Z     with policy():
2025-12-04T13:38:31.9342929Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9343189Z     raise RuntimeError(msg)
2025-12-04T13:38:31.9343645Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 2. CUDA driver allocated memory was 2300575744 and is now 3380609024.
2025-12-04T13:38:31.9344065Z 
2025-12-04T13:38:31.9344200Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9344555Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda
2025-12-04T13:38:31.9344816Z 
2025-12-04T13:38:31.9344913Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9345046Z 
2025-12-04T13:38:31.9345108Z Process 3 exited with error code 10 and exception:
2025-12-04T13:38:31.9345258Z Traceback (most recent call last):
2025-12-04T13:38:31.9345514Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9345766Z     getattr(self, test_name)()
2025-12-04T13:38:31.9346008Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9346277Z     fn()
2025-12-04T13:38:31.9346566Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9346805Z     method(*args, **kwargs)
2025-12-04T13:38:31.9347032Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9347269Z     method(*args, **kwargs)
2025-12-04T13:38:31.9347496Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9347761Z     with policy():
2025-12-04T13:38:31.9348027Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9348268Z     raise RuntimeError(msg)
2025-12-04T13:38:31.9348719Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 3. CUDA driver allocated memory was 2250244096 and is now 3330277376.
2025-12-04T13:38:31.9349106Z 
2025-12-04T13:38:31.9349184Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9349524Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda
2025-12-04T13:38:31.9349877Z 
2025-12-04T13:38:31.9349972Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9350172Z 
2025-12-04T13:38:31.9350174Z 
2025-12-04T13:38:31.9350258Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:31.9350469Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:31.9350845Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-bfe1494716c9ba3f.xml -
2025-12-04T13:38:31.9351220Z =========================== short test summary info ============================
2025-12-04T13:38:31.9351611Z FAILED [9.3193s] distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_cuda_first_True_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T13:38:31.9352000Z Traceback (most recent call last):
2025-12-04T13:38:31.9352255Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9352508Z     getattr(self, test_name)()
2025-12-04T13:38:31.9352756Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9353037Z     fn()
2025-12-04T13:38:31.9353330Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9353575Z     method(*args, **kwargs)
2025-12-04T13:38:31.9353858Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9354099Z     method(*args, **kwargs)
2025-12-04T13:38:31.9354330Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9354565Z     with policy():
2025-12-04T13:38:31.9354783Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9355025Z     raise RuntimeError(msg)
2025-12-04T13:38:31.9355448Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 1. CUDA driver allocated memory was 2317352960 and is now 3397386240.
2025-12-04T13:38:31.9355833Z 
2025-12-04T13:38:31.9355937Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9356281Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda
2025-12-04T13:38:31.9356540Z 
2025-12-04T13:38:31.9356636Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9356762Z 
2025-12-04T13:38:31.9356827Z Process 2 exited with error code 10 and exception:
2025-12-04T13:38:31.9356977Z Traceback (most recent call last):
2025-12-04T13:38:31.9357229Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9357479Z     getattr(self, test_name)()
2025-12-04T13:38:31.9357733Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9357976Z     fn()
2025-12-04T13:38:31.9358189Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9358427Z     method(*args, **kwargs)
2025-12-04T13:38:31.9358654Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9358892Z     method(*args, **kwargs)
2025-12-04T13:38:31.9359118Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9359367Z     with policy():
2025-12-04T13:38:31.9359678Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9360304Z     raise RuntimeError(msg)
2025-12-04T13:38:31.9361853Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 2. CUDA driver allocated memory was 2300575744 and is now 3380609024.
2025-12-04T13:38:31.9363350Z 
2025-12-04T13:38:31.9363554Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9364431Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda
2025-12-04T13:38:31.9365106Z 
2025-12-04T13:38:31.9365346Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9365690Z 
2025-12-04T13:38:31.9365847Z Process 3 exited with error code 10 and exception:
2025-12-04T13:38:31.9366213Z Traceback (most recent call last):
2025-12-04T13:38:31.9366864Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9367498Z     getattr(self, test_name)()
2025-12-04T13:38:31.9368096Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9368647Z     fn()
2025-12-04T13:38:31.9369049Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9369506Z     method(*args, **kwargs)
2025-12-04T13:38:31.9370011Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9370466Z     method(*args, **kwargs)
2025-12-04T13:38:31.9370899Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9371349Z     with policy():
2025-12-04T13:38:31.9371772Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9372235Z     raise RuntimeError(msg)
2025-12-04T13:38:31.9373159Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 3. CUDA driver allocated memory was 2250244096 and is now 3330277376.
2025-12-04T13:38:31.9373958Z 
2025-12-04T13:38:31.9374105Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9374768Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda
2025-12-04T13:38:31.9375287Z 
2025-12-04T13:38:31.9375461Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9375840Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:31.9376239Z ============================== 1 failed in 9.48s ===============================
2025-12-04T13:38:31.9376508Z Got exit code 1
2025-12-04T13:38:31.9376703Z Retrying single test...
2025-12-04T13:38:31.9377211Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-8a029b4db9ef552d.xml
2025-12-04T13:38:31.9377771Z ============================= test session starts ==============================
2025-12-04T13:38:31.9378208Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:31.9378580Z cachedir: .pytest_cache
2025-12-04T13:38:31.9379019Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:31.9379541Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:31.9379780Z configfile: pytest.ini
2025-12-04T13:38:31.9380108Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:31.9380543Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:31.9381038Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_cuda_first_True_cuda
2025-12-04T13:38:31.9381474Z Running 1 items in this shard
2025-12-04T13:38:31.9381582Z 
2025-12-04T13:38:31.9382018Z distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_cuda_first_True_cuda I1204 12:57:07.159000 377817 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 377886
2025-12-04T13:38:31.9382722Z I1204 12:57:07.159000 377817 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 377887
2025-12-04T13:38:31.9383228Z I1204 12:57:07.160000 377817 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 377888
2025-12-04T13:38:31.9383728Z I1204 12:57:07.160000 377817 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 377889
2025-12-04T13:38:31.9384530Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:31.9385161Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:31.9385786Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:31.9386417Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:31.9387291Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:31.9388134Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:31.9388792Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:31.9389282Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:31.9389983Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:31.9390648Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:31.9391153Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:31.9391635Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:31.9392272Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:31.9392930Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:31.9393591Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:31.9394242Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:31.9394517Z [rank0]:E1204 12:57:14.348000 377886 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:31.9394905Z [rank0]:E1204 12:57:14.348000 377886 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:31.9395456Z [rank0]:E1204 12:57:14.348000 377886 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9395994Z [rank0]:E1204 12:57:14.348000 377886 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:31.9396537Z [rank0]:E1204 12:57:14.348000 377886 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9397056Z [rank0]:E1204 12:57:14.348000 377886 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:31.9397559Z [rank0]:E1204 12:57:14.348000 377886 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9398088Z [rank0]:E1204 12:57:14.348000 377886 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9398624Z [rank0]:E1204 12:57:14.348000 377886 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9399091Z [rank0]:E1204 12:57:14.348000 377886 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9399557Z [rank0]:E1204 12:57:14.348000 377886 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9400080Z [rank0]:E1204 12:57:14.348000 377886 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:31.9400557Z [rank0]:E1204 12:57:14.348000 377886 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9401031Z [rank0]:E1204 12:57:14.348000 377886 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:31.9401699Z [rank0]:E1204 12:57:14.348000 377886 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 2453667840 and is now 3533701120.
2025-12-04T13:38:31.9402374Z [rank0]:E1204 12:57:14.348000 377886 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9402729Z [rank0]:E1204 12:57:14.348000 377886 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9403343Z [rank0]:E1204 12:57:14.348000 377886 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda
2025-12-04T13:38:31.9403852Z [rank0]:E1204 12:57:14.348000 377886 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9404228Z [rank0]:E1204 12:57:14.348000 377886 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9404653Z [rank0]:E1204 12:57:14.348000 377886 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:31.9404901Z dist init r=0, world=4
2025-12-04T13:38:31.9405113Z [rank2]:E1204 12:57:14.356000 377888 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:31.9405472Z [rank2]:E1204 12:57:14.356000 377888 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:31.9405967Z [rank2]:E1204 12:57:14.356000 377888 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9406450Z [rank2]:E1204 12:57:14.356000 377888 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:31.9406931Z [rank2]:E1204 12:57:14.356000 377888 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9407383Z [rank2]:E1204 12:57:14.356000 377888 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:31.9407848Z [rank2]:E1204 12:57:14.356000 377888 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9408319Z [rank2]:E1204 12:57:14.356000 377888 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9408796Z [rank2]:E1204 12:57:14.356000 377888 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9409266Z [rank2]:E1204 12:57:14.356000 377888 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9409799Z [rank2]:E1204 12:57:14.356000 377888 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9410259Z [rank2]:E1204 12:57:14.356000 377888 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:31.9410722Z [rank2]:E1204 12:57:14.356000 377888 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9411204Z [rank2]:E1204 12:57:14.356000 377888 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:31.9411864Z [rank2]:E1204 12:57:14.356000 377888 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 2. CUDA driver allocated memory was 2300575744 and is now 3380609024.
2025-12-04T13:38:31.9412526Z [rank2]:E1204 12:57:14.356000 377888 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9412882Z [rank2]:E1204 12:57:14.356000 377888 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9413474Z [rank2]:E1204 12:57:14.356000 377888 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda
2025-12-04T13:38:31.9413980Z [rank2]:E1204 12:57:14.356000 377888 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9414351Z [rank2]:E1204 12:57:14.356000 377888 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9414775Z [rank2]:E1204 12:57:14.356000 377888 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:31.9415025Z dist init r=2, world=4
2025-12-04T13:38:31.9415235Z [rank3]:E1204 12:57:14.368000 377889 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:31.9415577Z [rank3]:E1204 12:57:14.368000 377889 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:31.9416070Z [rank3]:E1204 12:57:14.368000 377889 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9416555Z [rank3]:E1204 12:57:14.368000 377889 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:31.9417044Z [rank3]:E1204 12:57:14.368000 377889 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9417518Z [rank3]:E1204 12:57:14.368000 377889 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:31.9417970Z [rank3]:E1204 12:57:14.368000 377889 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9418444Z [rank3]:E1204 12:57:14.368000 377889 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9418916Z [rank3]:E1204 12:57:14.368000 377889 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9419405Z [rank3]:E1204 12:57:14.368000 377889 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9419927Z [rank3]:E1204 12:57:14.368000 377889 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9420383Z [rank3]:E1204 12:57:14.368000 377889 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:31.9420842Z [rank3]:E1204 12:57:14.368000 377889 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9421328Z [rank3]:E1204 12:57:14.368000 377889 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:31.9421994Z [rank3]:E1204 12:57:14.368000 377889 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 3. CUDA driver allocated memory was 2250244096 and is now 3330277376.
2025-12-04T13:38:31.9422628Z [rank3]:E1204 12:57:14.368000 377889 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9422982Z [rank3]:E1204 12:57:14.368000 377889 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9423574Z [rank3]:E1204 12:57:14.368000 377889 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda
2025-12-04T13:38:31.9424084Z [rank3]:E1204 12:57:14.368000 377889 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9424455Z [rank3]:E1204 12:57:14.368000 377889 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9424874Z [rank3]:E1204 12:57:14.368000 377889 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:31.9425123Z dist init r=3, world=4
2025-12-04T13:38:31.9425333Z [rank1]:E1204 12:57:14.376000 377887 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:31.9425677Z [rank1]:E1204 12:57:14.376000 377887 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:31.9426172Z [rank1]:E1204 12:57:14.376000 377887 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9426679Z [rank1]:E1204 12:57:14.376000 377887 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:31.9427161Z [rank1]:E1204 12:57:14.376000 377887 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9427614Z [rank1]:E1204 12:57:14.376000 377887 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:31.9428059Z [rank1]:E1204 12:57:14.376000 377887 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9428533Z [rank1]:E1204 12:57:14.376000 377887 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9429028Z [rank1]:E1204 12:57:14.376000 377887 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9429495Z [rank1]:E1204 12:57:14.376000 377887 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9430045Z [rank1]:E1204 12:57:14.376000 377887 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9430526Z [rank1]:E1204 12:57:14.376000 377887 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:31.9430994Z [rank1]:E1204 12:57:14.376000 377887 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9431499Z [rank1]:E1204 12:57:14.376000 377887 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:31.9432166Z [rank1]:E1204 12:57:14.376000 377887 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 1. CUDA driver allocated memory was 2317352960 and is now 3397386240.
2025-12-04T13:38:31.9432791Z [rank1]:E1204 12:57:14.376000 377887 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9433148Z [rank1]:E1204 12:57:14.376000 377887 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9433753Z [rank1]:E1204 12:57:14.376000 377887 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda
2025-12-04T13:38:31.9434260Z [rank1]:E1204 12:57:14.376000 377887 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9434633Z [rank1]:E1204 12:57:14.376000 377887 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9435053Z [rank1]:E1204 12:57:14.376000 377887 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:31.9435302Z dist init r=1, world=4
2025-12-04T13:38:31.9435731Z [rank0]:[W1204 12:57:14.510330657 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:31.9436151Z FAILED [9.0196s] [100%]
2025-12-04T13:38:31.9436243Z 
2025-12-04T13:38:31.9436306Z =================================== FAILURES ===================================
2025-12-04T13:38:31.9436504Z ____ TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda ____
2025-12-04T13:38:31.9436691Z Traceback (most recent call last):
2025-12-04T13:38:31.9436946Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:31.9437200Z     self._join_processes(fn)
2025-12-04T13:38:31.9437459Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:31.9437736Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:31.9438035Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:31.9438307Z     raise RuntimeError(error)
2025-12-04T13:38:31.9438473Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:31.9438650Z Traceback (most recent call last):
2025-12-04T13:38:31.9438902Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9439153Z     getattr(self, test_name)()
2025-12-04T13:38:31.9439396Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9439704Z     fn()
2025-12-04T13:38:31.9439917Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9440160Z     method(*args, **kwargs)
2025-12-04T13:38:31.9440397Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9440666Z     method(*args, **kwargs)
2025-12-04T13:38:31.9440899Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9441138Z     with policy():
2025-12-04T13:38:31.9441362Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9441605Z     raise RuntimeError(msg)
2025-12-04T13:38:31.9442032Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 2453667840 and is now 3533701120.
2025-12-04T13:38:31.9442421Z 
2025-12-04T13:38:31.9442506Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9442855Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda
2025-12-04T13:38:31.9443121Z 
2025-12-04T13:38:31.9443220Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9443348Z 
2025-12-04T13:38:31.9443350Z 
2025-12-04T13:38:31.9443440Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:31.9443652Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:31.9444025Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-8a029b4db9ef552d.xml -
2025-12-04T13:38:31.9444369Z =========================== short test summary info ============================
2025-12-04T13:38:31.9444722Z FAILED [9.0196s] distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_cuda_first_True_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:31.9445057Z Traceback (most recent call last):
2025-12-04T13:38:31.9445337Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9445591Z     getattr(self, test_name)()
2025-12-04T13:38:31.9445835Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9446077Z     fn()
2025-12-04T13:38:31.9446288Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9446531Z     method(*args, **kwargs)
2025-12-04T13:38:31.9446762Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9447002Z     method(*args, **kwargs)
2025-12-04T13:38:31.9447247Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9447489Z     with policy():
2025-12-04T13:38:31.9447718Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9447961Z     raise RuntimeError(msg)
2025-12-04T13:38:31.9448385Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 2453667840 and is now 3533701120.
2025-12-04T13:38:31.9448780Z 
2025-12-04T13:38:31.9448863Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9449208Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda
2025-12-04T13:38:31.9449476Z 
2025-12-04T13:38:31.9449620Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9449840Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:31.9450018Z ======================= 1 failed, 32 deselected in 9.18s =======================
2025-12-04T13:38:31.9450167Z Got exit code 1
2025-12-04T13:38:31.9450278Z Retrying single test...
2025-12-04T13:38:31.9450545Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0a24c8ddda618e13.xml
2025-12-04T13:38:31.9450839Z ============================= test session starts ==============================
2025-12-04T13:38:31.9451064Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:31.9451266Z cachedir: .pytest_cache
2025-12-04T13:38:31.9451501Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:31.9451756Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:31.9451887Z configfile: pytest.ini
2025-12-04T13:38:31.9452130Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:31.9452414Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:31.9452757Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_cuda_first_True_cuda
2025-12-04T13:38:31.9453069Z Running 1 items in this shard
2025-12-04T13:38:31.9453152Z 
2025-12-04T13:38:31.9453458Z distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_cuda_first_True_cuda I1204 12:57:18.775000 378219 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 378288
2025-12-04T13:38:31.9453970Z I1204 12:57:18.776000 378219 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 378289
2025-12-04T13:38:31.9454343Z I1204 12:57:18.777000 378219 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 378290
2025-12-04T13:38:31.9454698Z I1204 12:57:18.777000 378219 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 378291
2025-12-04T13:38:31.9455257Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:31.9455710Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:31.9456321Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:31.9456920Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:31.9457384Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:31.9457837Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:31.9458416Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:31.9459035Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:31.9459498Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:31.9459983Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:31.9460423Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:31.9460866Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:31.9461446Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:31.9462042Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:31.9462638Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:31.9463232Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:31.9463482Z [rank1]:E1204 12:57:25.942000 378289 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:31.9463835Z [rank1]:E1204 12:57:25.942000 378289 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:31.9464352Z [rank1]:E1204 12:57:25.942000 378289 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9464843Z [rank1]:E1204 12:57:25.942000 378289 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:31.9465329Z [rank1]:E1204 12:57:25.942000 378289 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9465789Z [rank1]:E1204 12:57:25.942000 378289 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:31.9466253Z [rank1]:E1204 12:57:25.942000 378289 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9466727Z [rank1]:E1204 12:57:25.942000 378289 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9467200Z [rank1]:E1204 12:57:25.942000 378289 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9467673Z [rank1]:E1204 12:57:25.942000 378289 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9468165Z [rank1]:E1204 12:57:25.942000 378289 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9468629Z [rank1]:E1204 12:57:25.942000 378289 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:31.9469118Z [rank1]:E1204 12:57:25.942000 378289 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9469667Z [rank1]:E1204 12:57:25.942000 378289 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:31.9470340Z [rank1]:E1204 12:57:25.942000 378289 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 1. CUDA driver allocated memory was 2317352960 and is now 3397386240.
2025-12-04T13:38:31.9470969Z [rank1]:E1204 12:57:25.942000 378289 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9471329Z [rank1]:E1204 12:57:25.942000 378289 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9471922Z [rank1]:E1204 12:57:25.942000 378289 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda
2025-12-04T13:38:31.9472431Z [rank1]:E1204 12:57:25.942000 378289 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9472806Z [rank1]:E1204 12:57:25.942000 378289 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9473228Z [rank1]:E1204 12:57:25.942000 378289 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:31.9473481Z dist init r=1, world=4
2025-12-04T13:38:31.9473708Z [rank0]:E1204 12:57:25.945000 378288 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:31.9474055Z [rank0]:E1204 12:57:25.945000 378288 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:31.9474555Z [rank0]:E1204 12:57:25.945000 378288 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9475048Z [rank0]:E1204 12:57:25.945000 378288 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:31.9475552Z [rank0]:E1204 12:57:25.945000 378288 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9476018Z [rank0]:E1204 12:57:25.945000 378288 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:31.9476465Z [rank0]:E1204 12:57:25.945000 378288 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9476943Z [rank0]:E1204 12:57:25.945000 378288 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9477434Z [rank0]:E1204 12:57:25.945000 378288 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9477906Z [rank0]:E1204 12:57:25.945000 378288 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9478399Z [rank0]:E1204 12:57:25.945000 378288 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9478862Z [rank0]:E1204 12:57:25.945000 378288 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:31.9479332Z [rank0]:E1204 12:57:25.945000 378288 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9479834Z [rank0]:E1204 12:57:25.945000 378288 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:31.9480503Z [rank0]:E1204 12:57:25.945000 378288 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 2453667840 and is now 3533701120.
2025-12-04T13:38:31.9481131Z [rank0]:E1204 12:57:25.945000 378288 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9481489Z [rank0]:E1204 12:57:25.945000 378288 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9482086Z [rank0]:E1204 12:57:25.945000 378288 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda
2025-12-04T13:38:31.9482596Z [rank0]:E1204 12:57:25.945000 378288 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9482984Z [rank0]:E1204 12:57:25.945000 378288 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9483404Z [rank0]:E1204 12:57:25.945000 378288 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:31.9483652Z dist init r=0, world=4
2025-12-04T13:38:31.9483863Z [rank2]:E1204 12:57:25.954000 378290 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:31.9484207Z [rank2]:E1204 12:57:25.954000 378290 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:31.9484705Z [rank2]:E1204 12:57:25.954000 378290 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9485207Z [rank2]:E1204 12:57:25.954000 378290 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:31.9485696Z [rank2]:E1204 12:57:25.954000 378290 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9486151Z [rank2]:E1204 12:57:25.954000 378290 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:31.9486603Z [rank2]:E1204 12:57:25.954000 378290 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9487094Z [rank2]:E1204 12:57:25.954000 378290 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9487573Z [rank2]:E1204 12:57:25.954000 378290 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9488058Z [rank2]:E1204 12:57:25.954000 378290 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9488530Z [rank2]:E1204 12:57:25.954000 378290 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9488997Z [rank2]:E1204 12:57:25.954000 378290 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:31.9489468Z [rank2]:E1204 12:57:25.954000 378290 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9490007Z [rank2]:E1204 12:57:25.954000 378290 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:31.9490786Z [rank2]:E1204 12:57:25.954000 378290 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 2. CUDA driver allocated memory was 2300575744 and is now 3380609024.
2025-12-04T13:38:31.9491407Z [rank2]:E1204 12:57:25.954000 378290 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9491762Z [rank2]:E1204 12:57:25.954000 378290 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9492371Z [rank2]:E1204 12:57:25.954000 378290 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda
2025-12-04T13:38:31.9492883Z [rank2]:E1204 12:57:25.954000 378290 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9493255Z [rank2]:E1204 12:57:25.954000 378290 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9493675Z [rank2]:E1204 12:57:25.954000 378290 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:31.9493927Z dist init r=2, world=4
2025-12-04T13:38:31.9494137Z [rank3]:E1204 12:57:25.994000 378291 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:31.9494495Z [rank3]:E1204 12:57:25.994000 378291 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:31.9494993Z [rank3]:E1204 12:57:25.994000 378291 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9495479Z [rank3]:E1204 12:57:25.994000 378291 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:31.9495971Z [rank3]:E1204 12:57:25.994000 378291 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9496449Z [rank3]:E1204 12:57:25.994000 378291 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:31.9496899Z [rank3]:E1204 12:57:25.994000 378291 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9497383Z [rank3]:E1204 12:57:25.994000 378291 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9497856Z [rank3]:E1204 12:57:25.994000 378291 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9498329Z [rank3]:E1204 12:57:25.994000 378291 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9498857Z [rank3]:E1204 12:57:25.994000 378291 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9499434Z [rank3]:E1204 12:57:25.994000 378291 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:31.9499978Z [rank3]:E1204 12:57:25.994000 378291 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9500486Z [rank3]:E1204 12:57:25.994000 378291 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:31.9501184Z [rank3]:E1204 12:57:25.994000 378291 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 3. CUDA driver allocated memory was 2250244096 and is now 3330277376.
2025-12-04T13:38:31.9501836Z [rank3]:E1204 12:57:25.994000 378291 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9502264Z [rank3]:E1204 12:57:25.994000 378291 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9502886Z [rank3]:E1204 12:57:25.994000 378291 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda
2025-12-04T13:38:31.9503414Z [rank3]:E1204 12:57:25.994000 378291 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9503831Z [rank3]:E1204 12:57:25.994000 378291 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9504277Z [rank3]:E1204 12:57:25.994000 378291 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:31.9504583Z dist init r=3, world=4
2025-12-04T13:38:31.9505046Z [rank0]:[W1204 12:57:26.110710486 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:31.9505487Z FAILED [9.0183s] [100%]
2025-12-04T13:38:31.9505572Z 
2025-12-04T13:38:31.9505662Z =================================== FAILURES ===================================
2025-12-04T13:38:31.9505892Z ____ TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda ____
2025-12-04T13:38:31.9506122Z Traceback (most recent call last):
2025-12-04T13:38:31.9506405Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:31.9506689Z     self._join_processes(fn)
2025-12-04T13:38:31.9506982Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:31.9507305Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:31.9507602Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:31.9507921Z     raise RuntimeError(error)
2025-12-04T13:38:31.9508104Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:31.9508303Z Traceback (most recent call last):
2025-12-04T13:38:31.9508591Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9508870Z     getattr(self, test_name)()
2025-12-04T13:38:31.9509153Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9509420Z     fn()
2025-12-04T13:38:31.9509705Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9509991Z     method(*args, **kwargs)
2025-12-04T13:38:31.9510250Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9510518Z     method(*args, **kwargs)
2025-12-04T13:38:31.9510783Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9511044Z     with policy():
2025-12-04T13:38:31.9511305Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9511573Z     raise RuntimeError(msg)
2025-12-04T13:38:31.9512014Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 2453667840 and is now 3533701120.
2025-12-04T13:38:31.9512442Z 
2025-12-04T13:38:31.9512546Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9512916Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda
2025-12-04T13:38:31.9513197Z 
2025-12-04T13:38:31.9513306Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9513462Z 
2025-12-04T13:38:31.9513464Z 
2025-12-04T13:38:31.9513555Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:31.9513796Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:31.9514203Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0a24c8ddda618e13.xml -
2025-12-04T13:38:31.9514583Z =========================== short test summary info ============================
2025-12-04T13:38:31.9514960Z FAILED [9.0183s] distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_cuda_first_True_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:31.9515347Z Traceback (most recent call last):
2025-12-04T13:38:31.9515619Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9515926Z     getattr(self, test_name)()
2025-12-04T13:38:31.9516198Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9516473Z     fn()
2025-12-04T13:38:31.9516729Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9516997Z     method(*args, **kwargs)
2025-12-04T13:38:31.9517264Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9517555Z     method(*args, **kwargs)
2025-12-04T13:38:31.9517806Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9518082Z     with policy():
2025-12-04T13:38:31.9518329Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9518590Z     raise RuntimeError(msg)
2025-12-04T13:38:31.9519054Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 2453667840 and is now 3533701120.
2025-12-04T13:38:31.9519441Z 
2025-12-04T13:38:31.9519539Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9519962Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_True_cuda
2025-12-04T13:38:31.9520240Z 
2025-12-04T13:38:31.9520355Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9520577Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:31.9520797Z ======================= 1 failed, 32 deselected in 9.16s =======================
2025-12-04T13:38:31.9520973Z Got exit code 1
2025-12-04T13:38:31.9521231Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_cuda_first_True_cuda
2025-12-04T13:38:31.9521619Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T13:38:31.9522006Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-91743904ac89e77a.xml
2025-12-04T13:38:31.9522345Z ============================= test session starts ==============================
2025-12-04T13:38:31.9522596Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:31.9522823Z cachedir: .pytest_cache
2025-12-04T13:38:31.9523093Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:31.9523363Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:31.9523514Z configfile: pytest.ini
2025-12-04T13:38:31.9523793Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:31.9524106Z collecting ... collected 60 items / 1 deselected / 59 selected
2025-12-04T13:38:31.9524306Z stepcurrent: skipping 1 already run items.
2025-12-04T13:38:31.9524476Z Running 32 items in this shard
2025-12-04T13:38:31.9524574Z 
2025-12-04T13:38:31.9524930Z distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_False_mixed_precision_False_cuda I1204 12:57:30.534000 378621 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 378690
2025-12-04T13:38:31.9525495Z I1204 12:57:30.535000 378621 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 378691
2025-12-04T13:38:31.9525873Z I1204 12:57:30.535000 378621 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 378692
2025-12-04T13:38:31.9526239Z I1204 12:57:30.536000 378621 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 378693
2025-12-04T13:38:31.9526878Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:31.9527358Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:31.9527982Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:31.9528601Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:31.9529081Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:31.9529619Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:31.9530224Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:31.9530848Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:31.9531341Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:31.9531805Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:31.9532293Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:31.9532789Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:31.9533399Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:31.9534018Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:31.9534651Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:31.9535280Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:31.9535549Z [rank2]:E1204 12:57:36.503000 378692 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:31.9535925Z [rank2]:E1204 12:57:36.503000 378692 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:31.9536461Z [rank2]:E1204 12:57:36.503000 378692 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9536996Z [rank2]:E1204 12:57:36.503000 378692 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:31.9537513Z [rank2]:E1204 12:57:36.503000 378692 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9538018Z [rank2]:E1204 12:57:36.503000 378692 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:31.9538495Z [rank2]:E1204 12:57:36.503000 378692 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9539011Z [rank2]:E1204 12:57:36.503000 378692 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9539517Z [rank2]:E1204 12:57:36.503000 378692 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9540054Z [rank2]:E1204 12:57:36.503000 378692 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9540558Z [rank2]:E1204 12:57:36.503000 378692 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9541044Z [rank2]:E1204 12:57:36.503000 378692 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:31.9541546Z [rank2]:E1204 12:57:36.503000 378692 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9542048Z [rank2]:E1204 12:57:36.503000 378692 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:31.9542775Z [rank2]:E1204 12:57:36.503000 378692 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 2300575744 and is now 3076521984.
2025-12-04T13:38:31.9543470Z [rank2]:E1204 12:57:36.503000 378692 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9543851Z [rank2]:E1204 12:57:36.503000 378692 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9544522Z [rank2]:E1204 12:57:36.503000 378692 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda
2025-12-04T13:38:31.9545091Z [rank2]:E1204 12:57:36.503000 378692 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9545488Z [rank2]:E1204 12:57:36.503000 378692 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9545952Z [rank2]:E1204 12:57:36.503000 378692 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:31.9546229Z dist init r=2, world=4
2025-12-04T13:38:31.9546477Z [rank1]:E1204 12:57:36.505000 378691 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:31.9546865Z [rank1]:E1204 12:57:36.505000 378691 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:31.9547386Z [rank1]:E1204 12:57:36.505000 378691 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9547919Z [rank1]:E1204 12:57:36.505000 378691 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:31.9548425Z [rank1]:E1204 12:57:36.505000 378691 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9548905Z [rank1]:E1204 12:57:36.505000 378691 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:31.9549392Z [rank1]:E1204 12:57:36.505000 378691 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9549929Z [rank1]:E1204 12:57:36.505000 378691 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9550437Z [rank1]:E1204 12:57:36.505000 378691 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9550957Z [rank1]:E1204 12:57:36.505000 378691 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9551445Z [rank1]:E1204 12:57:36.505000 378691 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9551951Z [rank1]:E1204 12:57:36.505000 378691 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:31.9552441Z [rank1]:E1204 12:57:36.505000 378691 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9552957Z [rank1]:E1204 12:57:36.505000 378691 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:31.9553687Z [rank1]:E1204 12:57:36.505000 378691 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 2317352960 and is now 3093299200.
2025-12-04T13:38:31.9554361Z [rank1]:E1204 12:57:36.505000 378691 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9554753Z [rank1]:E1204 12:57:36.505000 378691 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9555414Z [rank1]:E1204 12:57:36.505000 378691 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda
2025-12-04T13:38:31.9555992Z [rank1]:E1204 12:57:36.505000 378691 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9556398Z [rank1]:E1204 12:57:36.505000 378691 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9556856Z [rank1]:E1204 12:57:36.505000 378691 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:31.9557146Z dist init r=1, world=4
2025-12-04T13:38:31.9557382Z [rank0]:E1204 12:57:36.547000 378690 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:31.9557766Z [rank0]:E1204 12:57:36.547000 378690 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:31.9558304Z [rank0]:E1204 12:57:36.547000 378690 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9558812Z [rank0]:E1204 12:57:36.547000 378690 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:31.9559328Z [rank0]:E1204 12:57:36.547000 378690 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9559848Z [rank0]:E1204 12:57:36.547000 378690 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:31.9560325Z [rank0]:E1204 12:57:36.547000 378690 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9560831Z [rank0]:E1204 12:57:36.547000 378690 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9561331Z [rank0]:E1204 12:57:36.547000 378690 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9561827Z [rank0]:E1204 12:57:36.547000 378690 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9562341Z [rank0]:E1204 12:57:36.547000 378690 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9562858Z [rank0]:E1204 12:57:36.547000 378690 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:31.9563358Z [rank0]:E1204 12:57:36.547000 378690 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9563857Z [rank0]:E1204 12:57:36.547000 378690 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:31.9564561Z [rank0]:E1204 12:57:36.547000 378690 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 2453667840 and is now 3229614080.
2025-12-04T13:38:31.9565273Z [rank0]:E1204 12:57:36.547000 378690 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9565657Z [rank0]:E1204 12:57:36.547000 378690 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9566316Z [rank0]:E1204 12:57:36.547000 378690 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda
2025-12-04T13:38:31.9566894Z [rank0]:E1204 12:57:36.547000 378690 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9567293Z [rank0]:E1204 12:57:36.547000 378690 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9613504Z [rank0]:E1204 12:57:36.547000 378690 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:31.9613776Z dist init r=0, world=4
2025-12-04T13:38:31.9613997Z [rank3]:E1204 12:57:36.559000 378693 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:31.9614344Z [rank3]:E1204 12:57:36.559000 378693 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:31.9614842Z [rank3]:E1204 12:57:36.559000 378693 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9615335Z [rank3]:E1204 12:57:36.559000 378693 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:31.9615813Z [rank3]:E1204 12:57:36.559000 378693 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9616259Z [rank3]:E1204 12:57:36.559000 378693 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:31.9616695Z [rank3]:E1204 12:57:36.559000 378693 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9617159Z [rank3]:E1204 12:57:36.559000 378693 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9617622Z [rank3]:E1204 12:57:36.559000 378693 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9618081Z [rank3]:E1204 12:57:36.559000 378693 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9618652Z [rank3]:E1204 12:57:36.559000 378693 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9619103Z [rank3]:E1204 12:57:36.559000 378693 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:31.9619551Z [rank3]:E1204 12:57:36.559000 378693 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9620049Z [rank3]:E1204 12:57:36.559000 378693 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:31.9620766Z [rank3]:E1204 12:57:36.559000 378693 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 2250244096 and is now 3026190336.
2025-12-04T13:38:31.9621407Z [rank3]:E1204 12:57:36.559000 378693 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9621754Z [rank3]:E1204 12:57:36.559000 378693 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9622385Z [rank3]:E1204 12:57:36.559000 378693 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda
2025-12-04T13:38:31.9622932Z [rank3]:E1204 12:57:36.559000 378693 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9623295Z [rank3]:E1204 12:57:36.559000 378693 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9623706Z [rank3]:E1204 12:57:36.559000 378693 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:31.9623945Z dist init r=3, world=4
2025-12-04T13:38:31.9624347Z [rank0]:[W1204 12:57:36.799763242 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:31.9624752Z FAILED [7.7175s] [  3%]
2025-12-04T13:38:31.9624816Z 
2025-12-04T13:38:31.9624877Z =================================== FAILURES ===================================
2025-12-04T13:38:31.9625090Z _ TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda _
2025-12-04T13:38:31.9625288Z Traceback (most recent call last):
2025-12-04T13:38:31.9625532Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:31.9625776Z     self._join_processes(fn)
2025-12-04T13:38:31.9626020Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:31.9626283Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:31.9626550Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:31.9626807Z     raise RuntimeError(error)
2025-12-04T13:38:31.9626956Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T13:38:31.9627115Z Traceback (most recent call last):
2025-12-04T13:38:31.9627366Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9627605Z     getattr(self, test_name)()
2025-12-04T13:38:31.9627834Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9628065Z     fn()
2025-12-04T13:38:31.9628263Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9628492Z     method(*args, **kwargs)
2025-12-04T13:38:31.9628710Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9628935Z     method(*args, **kwargs)
2025-12-04T13:38:31.9629150Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9629387Z     with policy():
2025-12-04T13:38:31.9629631Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9629860Z     raise RuntimeError(msg)
2025-12-04T13:38:31.9630293Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 2317352960 and is now 3093299200.
2025-12-04T13:38:31.9630695Z 
2025-12-04T13:38:31.9630787Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9631149Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda
2025-12-04T13:38:31.9631434Z 
2025-12-04T13:38:31.9631525Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9631662Z 
2025-12-04T13:38:31.9631722Z Process 2 exited with error code 10 and exception:
2025-12-04T13:38:31.9631861Z Traceback (most recent call last):
2025-12-04T13:38:31.9632103Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9632344Z     getattr(self, test_name)()
2025-12-04T13:38:31.9632573Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9632803Z     fn()
2025-12-04T13:38:31.9633001Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9633230Z     method(*args, **kwargs)
2025-12-04T13:38:31.9633446Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9633674Z     method(*args, **kwargs)
2025-12-04T13:38:31.9633889Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9634112Z     with policy():
2025-12-04T13:38:31.9634320Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9634549Z     raise RuntimeError(msg)
2025-12-04T13:38:31.9634982Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 2300575744 and is now 3076521984.
2025-12-04T13:38:31.9635385Z 
2025-12-04T13:38:31.9635458Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9635815Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda
2025-12-04T13:38:31.9636114Z 
2025-12-04T13:38:31.9636202Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9636325Z 
2025-12-04T13:38:31.9636327Z 
2025-12-04T13:38:31.9636405Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:31.9636604Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:31.9636963Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-91743904ac89e77a.xml -
2025-12-04T13:38:31.9637287Z =========================== short test summary info ============================
2025-12-04T13:38:31.9637672Z FAILED [7.7175s] distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_False_mixed_precision_False_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T13:38:31.9638009Z Traceback (most recent call last):
2025-12-04T13:38:31.9638252Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9638492Z     getattr(self, test_name)()
2025-12-04T13:38:31.9638720Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9638950Z     fn()
2025-12-04T13:38:31.9639147Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9639384Z     method(*args, **kwargs)
2025-12-04T13:38:31.9639630Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9639856Z     method(*args, **kwargs)
2025-12-04T13:38:31.9640071Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9640307Z     with policy():
2025-12-04T13:38:31.9640517Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9640744Z     raise RuntimeError(msg)
2025-12-04T13:38:31.9641178Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 2317352960 and is now 3093299200.
2025-12-04T13:38:31.9641580Z 
2025-12-04T13:38:31.9641653Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9642008Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda
2025-12-04T13:38:31.9642286Z 
2025-12-04T13:38:31.9642375Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9642497Z 
2025-12-04T13:38:31.9642557Z Process 2 exited with error code 10 and exception:
2025-12-04T13:38:31.9642691Z Traceback (most recent call last):
2025-12-04T13:38:31.9642928Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9643167Z     getattr(self, test_name)()
2025-12-04T13:38:31.9643396Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9643625Z     fn()
2025-12-04T13:38:31.9643820Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9644046Z     method(*args, **kwargs)
2025-12-04T13:38:31.9644261Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9644486Z     method(*args, **kwargs)
2025-12-04T13:38:31.9644723Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9644946Z     with policy():
2025-12-04T13:38:31.9645152Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9645383Z     raise RuntimeError(msg)
2025-12-04T13:38:31.9645811Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 2300575744 and is now 3076521984.
2025-12-04T13:38:31.9646212Z 
2025-12-04T13:38:31.9646287Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9646653Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda
2025-12-04T13:38:31.9646934Z 
2025-12-04T13:38:31.9647020Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9647204Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:31.9647367Z ======================= 1 failed, 1 deselected in 7.86s ========================
2025-12-04T13:38:31.9647503Z Got exit code 1
2025-12-04T13:38:31.9647597Z Retrying single test...
2025-12-04T13:38:31.9647862Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-27724013ff9ffa03.xml
2025-12-04T13:38:31.9648139Z ============================= test session starts ==============================
2025-12-04T13:38:31.9648347Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:31.9648543Z cachedir: .pytest_cache
2025-12-04T13:38:31.9648763Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:31.9649000Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:31.9649115Z configfile: pytest.ini
2025-12-04T13:38:31.9649340Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:31.9649683Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:31.9650033Z stepcurrent: skipping 1 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_False_mixed_precision_False_cuda
2025-12-04T13:38:31.9650349Z Running 1 items in this shard
2025-12-04T13:38:31.9650421Z 
2025-12-04T13:38:31.9650749Z distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_False_mixed_precision_False_cuda I1204 12:57:40.863000 379007 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 379076
2025-12-04T13:38:31.9651260Z I1204 12:57:40.863000 379007 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 379077
2025-12-04T13:38:31.9651601Z I1204 12:57:40.864000 379007 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 379078
2025-12-04T13:38:31.9651937Z I1204 12:57:40.865000 379007 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 379079
2025-12-04T13:38:31.9652482Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:31.9652923Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:31.9653525Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:31.9654110Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:31.9654554Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:31.9654986Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:31.9655428Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:31.9655858Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:31.9656283Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:31.9656709Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:31.9657284Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:31.9657878Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:31.9658465Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:31.9659044Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:31.9659657Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:31.9660231Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:31.9660468Z [rank1]:E1204 12:57:47.048000 379077 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:31.9660807Z [rank1]:E1204 12:57:47.048000 379077 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:31.9661292Z [rank1]:E1204 12:57:47.048000 379077 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9661768Z [rank1]:E1204 12:57:47.048000 379077 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:31.9662245Z [rank1]:E1204 12:57:47.048000 379077 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9662706Z [rank1]:E1204 12:57:47.048000 379077 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:31.9663146Z [rank1]:E1204 12:57:47.048000 379077 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9663604Z [rank1]:E1204 12:57:47.048000 379077 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9664070Z [rank1]:E1204 12:57:47.048000 379077 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9664525Z [rank1]:E1204 12:57:47.048000 379077 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9664995Z [rank1]:E1204 12:57:47.048000 379077 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9665439Z [rank1]:E1204 12:57:47.048000 379077 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:31.9665887Z [rank1]:E1204 12:57:47.048000 379077 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9666360Z [rank1]:E1204 12:57:47.048000 379077 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:31.9667043Z [rank1]:E1204 12:57:47.048000 379077 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 2317352960 and is now 3093299200.
2025-12-04T13:38:31.9667691Z [rank1]:E1204 12:57:47.048000 379077 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9668034Z [rank1]:E1204 12:57:47.048000 379077 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9668638Z [rank1]:E1204 12:57:47.048000 379077 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda
2025-12-04T13:38:31.9669161Z [rank1]:E1204 12:57:47.048000 379077 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9669524Z [rank1]:E1204 12:57:47.048000 379077 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9669976Z [rank1]:E1204 12:57:47.048000 379077 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:31.9670213Z dist init r=1, world=4
2025-12-04T13:38:31.9670420Z [rank3]:E1204 12:57:47.071000 379079 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:31.9670758Z [rank3]:E1204 12:57:47.071000 379079 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:31.9671244Z [rank3]:E1204 12:57:47.071000 379079 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9671742Z [rank3]:E1204 12:57:47.071000 379079 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:31.9672222Z [rank3]:E1204 12:57:47.071000 379079 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9672676Z [rank3]:E1204 12:57:47.071000 379079 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:31.9673116Z [rank3]:E1204 12:57:47.071000 379079 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9673578Z [rank3]:E1204 12:57:47.071000 379079 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9674055Z [rank3]:E1204 12:57:47.071000 379079 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9674521Z [rank3]:E1204 12:57:47.071000 379079 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9674985Z [rank3]:E1204 12:57:47.071000 379079 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9675450Z [rank3]:E1204 12:57:47.071000 379079 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:31.9675906Z [rank3]:E1204 12:57:47.071000 379079 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9676388Z [rank3]:E1204 12:57:47.071000 379079 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:31.9677067Z [rank3]:E1204 12:57:47.071000 379079 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 2250244096 and is now 3026190336.
2025-12-04T13:38:31.9677710Z [rank3]:E1204 12:57:47.071000 379079 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9678058Z [rank3]:E1204 12:57:47.071000 379079 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9678674Z [rank3]:E1204 12:57:47.071000 379079 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda
2025-12-04T13:38:31.9679198Z [rank3]:E1204 12:57:47.071000 379079 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9679561Z [rank3]:E1204 12:57:47.071000 379079 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9680023Z [rank3]:E1204 12:57:47.071000 379079 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:31.9680265Z dist init r=3, world=4
2025-12-04T13:38:31.9680469Z [rank2]:E1204 12:57:47.087000 379078 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:31.9680806Z [rank2]:E1204 12:57:47.087000 379078 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:31.9681306Z [rank2]:E1204 12:57:47.087000 379078 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9681784Z [rank2]:E1204 12:57:47.087000 379078 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:31.9682259Z [rank2]:E1204 12:57:47.087000 379078 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9682707Z [rank2]:E1204 12:57:47.087000 379078 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:31.9683158Z [rank2]:E1204 12:57:47.087000 379078 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9683622Z [rank2]:E1204 12:57:47.087000 379078 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9684085Z [rank2]:E1204 12:57:47.087000 379078 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9684547Z [rank2]:E1204 12:57:47.087000 379078 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9685022Z [rank2]:E1204 12:57:47.087000 379078 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9685477Z [rank2]:E1204 12:57:47.087000 379078 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:31.9685949Z [rank2]:E1204 12:57:47.087000 379078 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9686414Z [rank2]:E1204 12:57:47.087000 379078 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:31.9687093Z [rank2]:E1204 12:57:47.087000 379078 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 2300575744 and is now 3076521984.
2025-12-04T13:38:31.9687734Z [rank2]:E1204 12:57:47.087000 379078 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9688084Z [rank2]:E1204 12:57:47.087000 379078 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9688694Z [rank2]:E1204 12:57:47.087000 379078 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda
2025-12-04T13:38:31.9689216Z [rank2]:E1204 12:57:47.087000 379078 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9689621Z [rank2]:E1204 12:57:47.087000 379078 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9690036Z [rank2]:E1204 12:57:47.087000 379078 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:31.9690280Z dist init r=2, world=4
2025-12-04T13:38:31.9690499Z [rank0]:E1204 12:57:47.102000 379076 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:31.9690837Z [rank0]:E1204 12:57:47.102000 379076 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:31.9691328Z [rank0]:E1204 12:57:47.102000 379076 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9691811Z [rank0]:E1204 12:57:47.102000 379076 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:31.9692303Z [rank0]:E1204 12:57:47.102000 379076 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9692751Z [rank0]:E1204 12:57:47.102000 379076 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:31.9693190Z [rank0]:E1204 12:57:47.102000 379076 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9693651Z [rank0]:E1204 12:57:47.102000 379076 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9694128Z [rank0]:E1204 12:57:47.102000 379076 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9694589Z [rank0]:E1204 12:57:47.102000 379076 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9695064Z [rank0]:E1204 12:57:47.102000 379076 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9695514Z [rank0]:E1204 12:57:47.102000 379076 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:31.9695969Z [rank0]:E1204 12:57:47.102000 379076 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9696434Z [rank0]:E1204 12:57:47.102000 379076 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:31.9697117Z [rank0]:E1204 12:57:47.102000 379076 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 2453667840 and is now 3229614080.
2025-12-04T13:38:31.9697759Z [rank0]:E1204 12:57:47.102000 379076 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9698108Z [rank0]:E1204 12:57:47.102000 379076 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9698717Z [rank0]:E1204 12:57:47.102000 379076 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda
2025-12-04T13:38:31.9699242Z [rank0]:E1204 12:57:47.102000 379076 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9699645Z [rank0]:E1204 12:57:47.102000 379076 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9700058Z [rank0]:E1204 12:57:47.102000 379076 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:31.9700303Z dist init r=0, world=4
2025-12-04T13:38:31.9700704Z [rank0]:[W1204 12:57:47.435284909 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:31.9701114Z FAILED [8.0172s] [100%]
2025-12-04T13:38:31.9701178Z 
2025-12-04T13:38:31.9701242Z =================================== FAILURES ===================================
2025-12-04T13:38:31.9701476Z _ TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda _
2025-12-04T13:38:31.9701678Z Traceback (most recent call last):
2025-12-04T13:38:31.9701927Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:31.9702175Z     self._join_processes(fn)
2025-12-04T13:38:31.9702421Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:31.9702690Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:31.9702961Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:31.9703236Z     raise RuntimeError(error)
2025-12-04T13:38:31.9703390Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T13:38:31.9703553Z Traceback (most recent call last):
2025-12-04T13:38:31.9703794Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9704050Z     getattr(self, test_name)()
2025-12-04T13:38:31.9704286Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9704521Z     fn()
2025-12-04T13:38:31.9704723Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9704959Z     method(*args, **kwargs)
2025-12-04T13:38:31.9705180Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9705415Z     method(*args, **kwargs)
2025-12-04T13:38:31.9705639Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9705869Z     with policy():
2025-12-04T13:38:31.9706084Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9706320Z     raise RuntimeError(msg)
2025-12-04T13:38:31.9706758Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 2317352960 and is now 3093299200.
2025-12-04T13:38:31.9707162Z 
2025-12-04T13:38:31.9707238Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9707603Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda
2025-12-04T13:38:31.9707889Z 
2025-12-04T13:38:31.9707978Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9708108Z 
2025-12-04T13:38:31.9708171Z Process 3 exited with error code 10 and exception:
2025-12-04T13:38:31.9708312Z Traceback (most recent call last):
2025-12-04T13:38:31.9708568Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9708816Z     getattr(self, test_name)()
2025-12-04T13:38:31.9709051Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9709288Z     fn()
2025-12-04T13:38:31.9709490Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9709768Z     method(*args, **kwargs)
2025-12-04T13:38:31.9709992Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9710221Z     method(*args, **kwargs)
2025-12-04T13:38:31.9710457Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9710688Z     with policy():
2025-12-04T13:38:31.9710901Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9711133Z     raise RuntimeError(msg)
2025-12-04T13:38:31.9711572Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 2250244096 and is now 3026190336.
2025-12-04T13:38:31.9711989Z 
2025-12-04T13:38:31.9712065Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9712428Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda
2025-12-04T13:38:31.9712725Z 
2025-12-04T13:38:31.9712820Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9712943Z 
2025-12-04T13:38:31.9712945Z 
2025-12-04T13:38:31.9713027Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:31.9713232Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:31.9713590Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-27724013ff9ffa03.xml -
2025-12-04T13:38:31.9713922Z =========================== short test summary info ============================
2025-12-04T13:38:31.9714289Z FAILED [8.0172s] distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_False_mixed_precision_False_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T13:38:31.9714632Z Traceback (most recent call last):
2025-12-04T13:38:31.9714883Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9715126Z     getattr(self, test_name)()
2025-12-04T13:38:31.9715362Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9715597Z     fn()
2025-12-04T13:38:31.9715802Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9716033Z     method(*args, **kwargs)
2025-12-04T13:38:31.9716259Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9716489Z     method(*args, **kwargs)
2025-12-04T13:38:31.9716707Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9716938Z     with policy():
2025-12-04T13:38:31.9717168Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9717403Z     raise RuntimeError(msg)
2025-12-04T13:38:31.9717842Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 2317352960 and is now 3093299200.
2025-12-04T13:38:31.9718248Z 
2025-12-04T13:38:31.9718328Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9718691Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda
2025-12-04T13:38:31.9718974Z 
2025-12-04T13:38:31.9719073Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9719200Z 
2025-12-04T13:38:31.9719260Z Process 3 exited with error code 10 and exception:
2025-12-04T13:38:31.9719401Z Traceback (most recent call last):
2025-12-04T13:38:31.9719671Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9719915Z     getattr(self, test_name)()
2025-12-04T13:38:31.9720148Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9720381Z     fn()
2025-12-04T13:38:31.9720607Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9720838Z     method(*args, **kwargs)
2025-12-04T13:38:31.9721058Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9721303Z     method(*args, **kwargs)
2025-12-04T13:38:31.9721525Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9721752Z     with policy():
2025-12-04T13:38:31.9721966Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9722201Z     raise RuntimeError(msg)
2025-12-04T13:38:31.9722651Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 2250244096 and is now 3026190336.
2025-12-04T13:38:31.9723062Z 
2025-12-04T13:38:31.9723137Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9723499Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda
2025-12-04T13:38:31.9723781Z 
2025-12-04T13:38:31.9723870Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9724059Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:31.9724226Z ======================= 1 failed, 32 deselected in 8.18s =======================
2025-12-04T13:38:31.9724364Z Got exit code 1
2025-12-04T13:38:31.9724463Z Retrying single test...
2025-12-04T13:38:31.9724719Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-7256f71ab4cb42b0.xml
2025-12-04T13:38:31.9725004Z ============================= test session starts ==============================
2025-12-04T13:38:31.9725217Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:31.9725409Z cachedir: .pytest_cache
2025-12-04T13:38:31.9725650Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:31.9725894Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:31.9726015Z configfile: pytest.ini
2025-12-04T13:38:31.9726242Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:31.9726518Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:31.9726764Z stepcurrent: skipping 1 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_False_mixed_precision_False_cuda
2025-12-04T13:38:31.9726809Z Running 1 items in this shard
2025-12-04T13:38:31.9726811Z 
2025-12-04T13:38:31.9727155Z distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_False_mixed_precision_False_cuda I1204 12:57:51.294000 379393 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 379462
2025-12-04T13:38:31.9727319Z I1204 12:57:51.295000 379393 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 379463
2025-12-04T13:38:31.9727469Z I1204 12:57:51.295000 379393 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 379464
2025-12-04T13:38:31.9727621Z I1204 12:57:51.296000 379393 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 379465
2025-12-04T13:38:31.9727980Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:31.9728042Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:31.9728536Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:31.9728611Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:31.9728969Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:31.9729018Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:31.9729512Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:31.9729612Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:31.9729968Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:31.9730013Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:31.9730501Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:31.9730563Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:31.9730932Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:31.9730980Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:31.9731467Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:31.9731529Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:31.9731694Z [rank1]:E1204 12:57:57.268000 379463 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:31.9731860Z [rank1]:E1204 12:57:57.268000 379463 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:31.9732153Z [rank1]:E1204 12:57:57.268000 379463 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9732308Z [rank1]:E1204 12:57:57.268000 379463 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:31.9732614Z [rank1]:E1204 12:57:57.268000 379463 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9732741Z [rank1]:E1204 12:57:57.268000 379463 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:31.9733034Z [rank1]:E1204 12:57:57.268000 379463 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9733182Z [rank1]:E1204 12:57:57.268000 379463 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9733461Z [rank1]:E1204 12:57:57.268000 379463 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9733610Z [rank1]:E1204 12:57:57.268000 379463 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9733887Z [rank1]:E1204 12:57:57.268000 379463 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9734027Z [rank1]:E1204 12:57:57.268000 379463 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:31.9734303Z [rank1]:E1204 12:57:57.268000 379463 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9734453Z [rank1]:E1204 12:57:57.268000 379463 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:31.9734961Z [rank1]:E1204 12:57:57.268000 379463 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 2317352960 and is now 3093299200.
2025-12-04T13:38:31.9735079Z [rank1]:E1204 12:57:57.268000 379463 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9735276Z [rank1]:E1204 12:57:57.268000 379463 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9735651Z [rank1]:E1204 12:57:57.268000 379463 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda
2025-12-04T13:38:31.9735767Z [rank1]:E1204 12:57:57.268000 379463 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9735986Z [rank1]:E1204 12:57:57.268000 379463 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9736155Z [rank1]:E1204 12:57:57.268000 379463 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:31.9736194Z dist init r=1, world=4
2025-12-04T13:38:31.9736333Z [rank0]:E1204 12:57:57.271000 379462 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:31.9736491Z [rank0]:E1204 12:57:57.271000 379462 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:31.9736792Z [rank0]:E1204 12:57:57.271000 379462 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9736946Z [rank0]:E1204 12:57:57.271000 379462 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:31.9737243Z [rank0]:E1204 12:57:57.271000 379462 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9737371Z [rank0]:E1204 12:57:57.271000 379462 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:31.9737652Z [rank0]:E1204 12:57:57.271000 379462 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9737804Z [rank0]:E1204 12:57:57.271000 379462 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9738080Z [rank0]:E1204 12:57:57.271000 379462 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9738230Z [rank0]:E1204 12:57:57.271000 379462 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9738511Z [rank0]:E1204 12:57:57.271000 379462 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9738649Z [rank0]:E1204 12:57:57.271000 379462 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:31.9738934Z [rank0]:E1204 12:57:57.271000 379462 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9739083Z [rank0]:E1204 12:57:57.271000 379462 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:31.9739630Z [rank0]:E1204 12:57:57.271000 379462 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 2453667840 and is now 3229614080.
2025-12-04T13:38:31.9739751Z [rank0]:E1204 12:57:57.271000 379462 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9739948Z [rank0]:E1204 12:57:57.271000 379462 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9740344Z [rank0]:E1204 12:57:57.271000 379462 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda
2025-12-04T13:38:31.9740460Z [rank0]:E1204 12:57:57.271000 379462 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9740675Z [rank0]:E1204 12:57:57.271000 379462 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9740839Z [rank0]:E1204 12:57:57.271000 379462 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:31.9740895Z dist init r=0, world=4
2025-12-04T13:38:31.9741032Z [rank2]:E1204 12:57:57.274000 379464 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:31.9741198Z [rank2]:E1204 12:57:57.274000 379464 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:31.9741498Z [rank2]:E1204 12:57:57.274000 379464 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9741654Z [rank2]:E1204 12:57:57.274000 379464 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:31.9741941Z [rank2]:E1204 12:57:57.274000 379464 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9742066Z [rank2]:E1204 12:57:57.274000 379464 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:31.9742351Z [rank2]:E1204 12:57:57.274000 379464 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9742499Z [rank2]:E1204 12:57:57.274000 379464 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9742778Z [rank2]:E1204 12:57:57.274000 379464 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9742925Z [rank2]:E1204 12:57:57.274000 379464 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9743206Z [rank2]:E1204 12:57:57.274000 379464 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9743344Z [rank2]:E1204 12:57:57.274000 379464 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:31.9743633Z [rank2]:E1204 12:57:57.274000 379464 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9743785Z [rank2]:E1204 12:57:57.274000 379464 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:31.9744276Z [rank2]:E1204 12:57:57.274000 379464 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 2300575744 and is now 3076521984.
2025-12-04T13:38:31.9744396Z [rank2]:E1204 12:57:57.274000 379464 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9744606Z [rank2]:E1204 12:57:57.274000 379464 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9744980Z [rank2]:E1204 12:57:57.274000 379464 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda
2025-12-04T13:38:31.9745095Z [rank2]:E1204 12:57:57.274000 379464 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9745322Z [rank2]:E1204 12:57:57.274000 379464 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9745490Z [rank2]:E1204 12:57:57.274000 379464 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:31.9745530Z dist init r=2, world=4
2025-12-04T13:38:31.9745683Z [rank3]:E1204 12:57:57.319000 379465 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:31.9745845Z [rank3]:E1204 12:57:57.319000 379465 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:31.9746140Z [rank3]:E1204 12:57:57.319000 379465 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9746299Z [rank3]:E1204 12:57:57.319000 379465 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:31.9746587Z [rank3]:E1204 12:57:57.319000 379465 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9746715Z [rank3]:E1204 12:57:57.319000 379465 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:31.9746997Z [rank3]:E1204 12:57:57.319000 379465 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9747147Z [rank3]:E1204 12:57:57.319000 379465 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9747425Z [rank3]:E1204 12:57:57.319000 379465 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9747577Z [rank3]:E1204 12:57:57.319000 379465 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9747872Z [rank3]:E1204 12:57:57.319000 379465 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9748009Z [rank3]:E1204 12:57:57.319000 379465 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:31.9748291Z [rank3]:E1204 12:57:57.319000 379465 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9748440Z [rank3]:E1204 12:57:57.319000 379465 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:31.9748945Z [rank3]:E1204 12:57:57.319000 379465 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 2250244096 and is now 3026190336.
2025-12-04T13:38:31.9749060Z [rank3]:E1204 12:57:57.319000 379465 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9749260Z [rank3]:E1204 12:57:57.319000 379465 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9749899Z [rank3]:E1204 12:57:57.319000 379465 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda
2025-12-04T13:38:31.9750027Z [rank3]:E1204 12:57:57.319000 379465 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9750242Z [rank3]:E1204 12:57:57.319000 379465 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9750420Z [rank3]:E1204 12:57:57.319000 379465 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:31.9750463Z dist init r=3, world=4
2025-12-04T13:38:31.9750805Z [rank0]:[W1204 12:57:57.438701986 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:31.9750850Z FAILED [7.7160s] [100%]
2025-12-04T13:38:31.9750853Z 
2025-12-04T13:38:31.9750911Z =================================== FAILURES ===================================
2025-12-04T13:38:31.9751030Z _ TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda _
2025-12-04T13:38:31.9751078Z Traceback (most recent call last):
2025-12-04T13:38:31.9751246Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:31.9751294Z     self._join_processes(fn)
2025-12-04T13:38:31.9751468Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:31.9751525Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:31.9751704Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:31.9751753Z     raise RuntimeError(error)
2025-12-04T13:38:31.9751833Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:31.9751882Z Traceback (most recent call last):
2025-12-04T13:38:31.9752044Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9752091Z     getattr(self, test_name)()
2025-12-04T13:38:31.9752264Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9752302Z     fn()
2025-12-04T13:38:31.9752454Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9752500Z     method(*args, **kwargs)
2025-12-04T13:38:31.9752651Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9752697Z     method(*args, **kwargs)
2025-12-04T13:38:31.9752850Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9752893Z     with policy():
2025-12-04T13:38:31.9753064Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9753111Z     raise RuntimeError(msg)
2025-12-04T13:38:31.9753480Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 2453667840 and is now 3229614080.
2025-12-04T13:38:31.9753486Z 
2025-12-04T13:38:31.9753564Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9753816Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda
2025-12-04T13:38:31.9753832Z 
2025-12-04T13:38:31.9753920Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9753922Z 
2025-12-04T13:38:31.9753988Z Process 1 exited with error code 10 and exception:
2025-12-04T13:38:31.9754048Z Traceback (most recent call last):
2025-12-04T13:38:31.9754216Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9754259Z     getattr(self, test_name)()
2025-12-04T13:38:31.9754422Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9754457Z     fn()
2025-12-04T13:38:31.9754614Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9754656Z     method(*args, **kwargs)
2025-12-04T13:38:31.9754813Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9754852Z     method(*args, **kwargs)
2025-12-04T13:38:31.9755006Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9755045Z     with policy():
2025-12-04T13:38:31.9755201Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9755243Z     raise RuntimeError(msg)
2025-12-04T13:38:31.9755616Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 2317352960 and is now 3093299200.
2025-12-04T13:38:31.9755620Z 
2025-12-04T13:38:31.9755698Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9755944Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda
2025-12-04T13:38:31.9755946Z 
2025-12-04T13:38:31.9756038Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9756041Z 
2025-12-04T13:38:31.9756124Z Process 2 exited with error code 10 and exception:
2025-12-04T13:38:31.9756174Z Traceback (most recent call last):
2025-12-04T13:38:31.9756337Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9756383Z     getattr(self, test_name)()
2025-12-04T13:38:31.9756543Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9756583Z     fn()
2025-12-04T13:38:31.9756733Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9756777Z     method(*args, **kwargs)
2025-12-04T13:38:31.9756931Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9756988Z     method(*args, **kwargs)
2025-12-04T13:38:31.9757145Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9757185Z     with policy():
2025-12-04T13:38:31.9757340Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9757380Z     raise RuntimeError(msg)
2025-12-04T13:38:31.9757753Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 2300575744 and is now 3076521984.
2025-12-04T13:38:31.9757766Z 
2025-12-04T13:38:31.9757840Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9758090Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda
2025-12-04T13:38:31.9758103Z 
2025-12-04T13:38:31.9758188Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9758190Z 
2025-12-04T13:38:31.9758192Z 
2025-12-04T13:38:31.9758270Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:31.9758360Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:31.9758596Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-7256f71ab4cb42b0.xml -
2025-12-04T13:38:31.9758664Z =========================== short test summary info ============================
2025-12-04T13:38:31.9758933Z FAILED [7.7160s] distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_False_mixed_precision_False_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:31.9758982Z Traceback (most recent call last):
2025-12-04T13:38:31.9759147Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9759192Z     getattr(self, test_name)()
2025-12-04T13:38:31.9759353Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9759391Z     fn()
2025-12-04T13:38:31.9759542Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9759801Z     method(*args, **kwargs)
2025-12-04T13:38:31.9759953Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9759996Z     method(*args, **kwargs)
2025-12-04T13:38:31.9760147Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9760203Z     with policy():
2025-12-04T13:38:31.9760355Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9760398Z     raise RuntimeError(msg)
2025-12-04T13:38:31.9760767Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 2453667840 and is now 3229614080.
2025-12-04T13:38:31.9760772Z 
2025-12-04T13:38:31.9760845Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9761116Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda
2025-12-04T13:38:31.9761119Z 
2025-12-04T13:38:31.9761208Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9761210Z 
2025-12-04T13:38:31.9761272Z Process 1 exited with error code 10 and exception:
2025-12-04T13:38:31.9761317Z Traceback (most recent call last):
2025-12-04T13:38:31.9761484Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9761526Z     getattr(self, test_name)()
2025-12-04T13:38:31.9761688Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9761736Z     fn()
2025-12-04T13:38:31.9761889Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9761929Z     method(*args, **kwargs)
2025-12-04T13:38:31.9762084Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9762139Z     method(*args, **kwargs)
2025-12-04T13:38:31.9762293Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9762331Z     with policy():
2025-12-04T13:38:31.9762487Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9762532Z     raise RuntimeError(msg)
2025-12-04T13:38:31.9762898Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 2317352960 and is now 3093299200.
2025-12-04T13:38:31.9762902Z 
2025-12-04T13:38:31.9762980Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9763229Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda
2025-12-04T13:38:31.9763232Z 
2025-12-04T13:38:31.9763324Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9763326Z 
2025-12-04T13:38:31.9763384Z Process 2 exited with error code 10 and exception:
2025-12-04T13:38:31.9763433Z Traceback (most recent call last):
2025-12-04T13:38:31.9763595Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9763641Z     getattr(self, test_name)()
2025-12-04T13:38:31.9763803Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9763841Z     fn()
2025-12-04T13:38:31.9763995Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9764039Z     method(*args, **kwargs)
2025-12-04T13:38:31.9764208Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9764248Z     method(*args, **kwargs)
2025-12-04T13:38:31.9764403Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9764440Z     with policy():
2025-12-04T13:38:31.9764595Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9764638Z     raise RuntimeError(msg)
2025-12-04T13:38:31.9765027Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 2300575744 and is now 3076521984.
2025-12-04T13:38:31.9765031Z 
2025-12-04T13:38:31.9765106Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9765356Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_False_cuda
2025-12-04T13:38:31.9765358Z 
2025-12-04T13:38:31.9765444Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9765511Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:31.9765587Z ======================= 1 failed, 32 deselected in 7.88s =======================
2025-12-04T13:38:31.9765628Z Got exit code 1
2025-12-04T13:38:31.9765831Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_False_mixed_precision_False_cuda
2025-12-04T13:38:31.9765962Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T13:38:31.9766168Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-147ebbffa2c93fc5.xml
2025-12-04T13:38:31.9766229Z ============================= test session starts ==============================
2025-12-04T13:38:31.9766346Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:31.9766388Z cachedir: .pytest_cache
2025-12-04T13:38:31.9766550Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:31.9766598Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:31.9766643Z configfile: pytest.ini
2025-12-04T13:38:31.9766805Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:31.9766884Z collecting ... collected 60 items / 2 deselected / 58 selected
2025-12-04T13:38:31.9766937Z stepcurrent: skipping 2 already run items.
2025-12-04T13:38:31.9766985Z Running 31 items in this shard
2025-12-04T13:38:31.9766987Z 
2025-12-04T13:38:31.9767314Z distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_False_mixed_precision_True_cuda I1204 12:58:01.368000 379779 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 379848
2025-12-04T13:38:31.9767474Z I1204 12:58:01.369000 379779 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 379849
2025-12-04T13:38:31.9767628Z I1204 12:58:01.369000 379779 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 379850
2025-12-04T13:38:31.9767782Z I1204 12:58:01.370000 379779 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 379851
2025-12-04T13:38:31.9768163Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:31.9768213Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:31.9768569Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:31.9768617Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:31.9768910Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type:
2025-12-04T13:38:31.9768986Z {<class 'torch.nn.modules.batchnorm.BatchNorm1d'>}
2025-12-04T13:38:31.9769096Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled.
2025-12-04T13:38:31.9769174Z   _warn_on_overridden_mixed_precision(overridden_module_classes)
2025-12-04T13:38:31.9769793Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:31.9769875Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:31.9770161Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type:
2025-12-04T13:38:31.9770229Z {<class 'torch.nn.modules.batchnorm.BatchNorm1d'>}
2025-12-04T13:38:31.9770352Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled.
2025-12-04T13:38:31.9770428Z   _warn_on_overridden_mixed_precision(overridden_module_classes)
2025-12-04T13:38:31.9770916Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:31.9770983Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:31.9771339Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:31.9771389Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:31.9771683Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type:
2025-12-04T13:38:31.9771745Z {<class 'torch.nn.modules.batchnorm.BatchNorm1d'>}
2025-12-04T13:38:31.9771850Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled.
2025-12-04T13:38:31.9771923Z   _warn_on_overridden_mixed_precision(overridden_module_classes)
2025-12-04T13:38:31.9772415Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:31.9772489Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:31.9772845Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:31.9772891Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:31.9773179Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type:
2025-12-04T13:38:31.9773245Z {<class 'torch.nn.modules.batchnorm.BatchNorm1d'>}
2025-12-04T13:38:31.9773344Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled.
2025-12-04T13:38:31.9773432Z   _warn_on_overridden_mixed_precision(overridden_module_classes)
2025-12-04T13:38:31.9773921Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:31.9773985Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:31.9774130Z [rank2]:E1204 12:58:07.410000 379850 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:31.9774308Z [rank2]:E1204 12:58:07.410000 379850 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:31.9774600Z [rank2]:E1204 12:58:07.410000 379850 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9774768Z [rank2]:E1204 12:58:07.410000 379850 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:31.9775056Z [rank2]:E1204 12:58:07.410000 379850 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9775182Z [rank2]:E1204 12:58:07.410000 379850 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:31.9775465Z [rank2]:E1204 12:58:07.410000 379850 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9775614Z [rank2]:E1204 12:58:07.410000 379850 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9775898Z [rank2]:E1204 12:58:07.410000 379850 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9776046Z [rank2]:E1204 12:58:07.410000 379850 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9776329Z [rank2]:E1204 12:58:07.410000 379850 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9776470Z [rank2]:E1204 12:58:07.410000 379850 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:31.9776748Z [rank2]:E1204 12:58:07.410000 379850 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9776911Z [rank2]:E1204 12:58:07.410000 379850 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:31.9777403Z [rank2]:E1204 12:58:07.410000 379850 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 2. CUDA driver allocated memory was 2300575744 and is now 3076521984.
2025-12-04T13:38:31.9777523Z [rank2]:E1204 12:58:07.410000 379850 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9777722Z [rank2]:E1204 12:58:07.410000 379850 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9778112Z [rank2]:E1204 12:58:07.410000 379850 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda
2025-12-04T13:38:31.9778231Z [rank2]:E1204 12:58:07.410000 379850 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9778446Z [rank2]:E1204 12:58:07.410000 379850 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9778624Z [rank2]:E1204 12:58:07.410000 379850 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:31.9778763Z [rank1]:E1204 12:58:07.410000 379849 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:31.9778937Z [rank1]:E1204 12:58:07.410000 379849 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:31.9779224Z [rank1]:E1204 12:58:07.410000 379849 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9779381Z [rank1]:E1204 12:58:07.410000 379849 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:31.9779720Z [rank1]:E1204 12:58:07.410000 379849 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9779844Z [rank1]:E1204 12:58:07.410000 379849 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:31.9780127Z [rank1]:E1204 12:58:07.410000 379849 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9780276Z [rank1]:E1204 12:58:07.410000 379849 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9780559Z [rank1]:E1204 12:58:07.410000 379849 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9780707Z [rank1]:E1204 12:58:07.410000 379849 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9780987Z [rank1]:E1204 12:58:07.410000 379849 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9781142Z [rank1]:E1204 12:58:07.410000 379849 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:31.9781422Z [rank1]:E1204 12:58:07.410000 379849 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9781572Z [rank1]:E1204 12:58:07.410000 379849 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:31.9782062Z [rank1]:E1204 12:58:07.410000 379849 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 1. CUDA driver allocated memory was 2317352960 and is now 3093299200.
2025-12-04T13:38:31.9782193Z [rank1]:E1204 12:58:07.410000 379849 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9782390Z [rank1]:E1204 12:58:07.410000 379849 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9782765Z [rank1]:E1204 12:58:07.410000 379849 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda
2025-12-04T13:38:31.9782894Z [rank1]:E1204 12:58:07.410000 379849 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9783107Z [rank1]:E1204 12:58:07.410000 379849 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9783278Z [rank1]:E1204 12:58:07.410000 379849 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:31.9783335Z dist init r=2, world=4
2025-12-04T13:38:31.9783378Z dist init r=1, world=4
2025-12-04T13:38:31.9783515Z [rank3]:E1204 12:58:07.412000 379851 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:31.9783675Z [rank3]:E1204 12:58:07.412000 379851 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:31.9783961Z [rank3]:E1204 12:58:07.412000 379851 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9784121Z [rank3]:E1204 12:58:07.412000 379851 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:31.9784409Z [rank3]:E1204 12:58:07.412000 379851 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9784533Z [rank3]:E1204 12:58:07.412000 379851 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:31.9784819Z [rank3]:E1204 12:58:07.412000 379851 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9784966Z [rank3]:E1204 12:58:07.412000 379851 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9785250Z [rank3]:E1204 12:58:07.412000 379851 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9785408Z [rank3]:E1204 12:58:07.412000 379851 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9785689Z [rank3]:E1204 12:58:07.412000 379851 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9785825Z [rank3]:E1204 12:58:07.412000 379851 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:31.9786107Z [rank3]:E1204 12:58:07.412000 379851 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9786261Z [rank3]:E1204 12:58:07.412000 379851 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:31.9786761Z [rank3]:E1204 12:58:07.412000 379851 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 3. CUDA driver allocated memory was 2250244096 and is now 3026190336.
2025-12-04T13:38:31.9786880Z [rank3]:E1204 12:58:07.412000 379851 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9787074Z [rank3]:E1204 12:58:07.412000 379851 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9787465Z [rank3]:E1204 12:58:07.412000 379851 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda
2025-12-04T13:38:31.9787592Z [rank3]:E1204 12:58:07.412000 379851 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9787800Z [rank3]:E1204 12:58:07.412000 379851 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9787965Z [rank3]:E1204 12:58:07.412000 379851 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:31.9788003Z dist init r=3, world=4
2025-12-04T13:38:31.9788142Z [rank0]:E1204 12:58:07.456000 379848 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:31.9788301Z [rank0]:E1204 12:58:07.456000 379848 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:31.9788592Z [rank0]:E1204 12:58:07.456000 379848 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9788747Z [rank0]:E1204 12:58:07.456000 379848 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:31.9789036Z [rank0]:E1204 12:58:07.456000 379848 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9789163Z [rank0]:E1204 12:58:07.456000 379848 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:31.9789442Z [rank0]:E1204 12:58:07.456000 379848 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9789636Z [rank0]:E1204 12:58:07.456000 379848 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9789930Z [rank0]:E1204 12:58:07.456000 379848 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9790082Z [rank0]:E1204 12:58:07.456000 379848 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9790375Z [rank0]:E1204 12:58:07.456000 379848 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9790517Z [rank0]:E1204 12:58:07.456000 379848 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:31.9790815Z [rank0]:E1204 12:58:07.456000 379848 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9790965Z [rank0]:E1204 12:58:07.456000 379848 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:31.9791459Z [rank0]:E1204 12:58:07.456000 379848 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 0. CUDA driver allocated memory was 2453667840 and is now 3229614080.
2025-12-04T13:38:31.9791586Z [rank0]:E1204 12:58:07.456000 379848 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9791784Z [rank0]:E1204 12:58:07.456000 379848 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9792170Z [rank0]:E1204 12:58:07.456000 379848 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda
2025-12-04T13:38:31.9792285Z [rank0]:E1204 12:58:07.456000 379848 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9792500Z [rank0]:E1204 12:58:07.456000 379848 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9792666Z [rank0]:E1204 12:58:07.456000 379848 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:31.9792709Z dist init r=0, world=4
2025-12-04T13:38:31.9793046Z [rank0]:[W1204 12:58:07.711940587 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:31.9793090Z FAILED [7.8155s] [  3%]
2025-12-04T13:38:31.9793092Z 
2025-12-04T13:38:31.9793150Z =================================== FAILURES ===================================
2025-12-04T13:38:31.9793269Z _ TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda _
2025-12-04T13:38:31.9793317Z Traceback (most recent call last):
2025-12-04T13:38:31.9793485Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:31.9793532Z     self._join_processes(fn)
2025-12-04T13:38:31.9793708Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:31.9793763Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:31.9793959Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:31.9794008Z     raise RuntimeError(error)
2025-12-04T13:38:31.9794088Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T13:38:31.9794139Z Traceback (most recent call last):
2025-12-04T13:38:31.9794303Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9794349Z     getattr(self, test_name)()
2025-12-04T13:38:31.9794510Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9794550Z     fn()
2025-12-04T13:38:31.9794704Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9794750Z     method(*args, **kwargs)
2025-12-04T13:38:31.9794913Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9794959Z     method(*args, **kwargs)
2025-12-04T13:38:31.9795112Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9795154Z     with policy():
2025-12-04T13:38:31.9795310Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9795357Z     raise RuntimeError(msg)
2025-12-04T13:38:31.9795739Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 1. CUDA driver allocated memory was 2317352960 and is now 3093299200.
2025-12-04T13:38:31.9795741Z 
2025-12-04T13:38:31.9795822Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9796081Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda
2025-12-04T13:38:31.9796087Z 
2025-12-04T13:38:31.9796176Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9796178Z 
2025-12-04T13:38:31.9796241Z Process 2 exited with error code 10 and exception:
2025-12-04T13:38:31.9796288Z Traceback (most recent call last):
2025-12-04T13:38:31.9796458Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9796503Z     getattr(self, test_name)()
2025-12-04T13:38:31.9796667Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9796703Z     fn()
2025-12-04T13:38:31.9796860Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9796901Z     method(*args, **kwargs)
2025-12-04T13:38:31.9797056Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9797097Z     method(*args, **kwargs)
2025-12-04T13:38:31.9797255Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9797292Z     with policy():
2025-12-04T13:38:31.9797450Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9797491Z     raise RuntimeError(msg)
2025-12-04T13:38:31.9797866Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 2. CUDA driver allocated memory was 2300575744 and is now 3076521984.
2025-12-04T13:38:31.9797879Z 
2025-12-04T13:38:31.9797957Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9798204Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda
2025-12-04T13:38:31.9798206Z 
2025-12-04T13:38:31.9798297Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9798300Z 
2025-12-04T13:38:31.9798360Z Process 3 exited with error code 10 and exception:
2025-12-04T13:38:31.9798409Z Traceback (most recent call last):
2025-12-04T13:38:31.9798574Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9798621Z     getattr(self, test_name)()
2025-12-04T13:38:31.9798792Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9798833Z     fn()
2025-12-04T13:38:31.9798987Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9799031Z     method(*args, **kwargs)
2025-12-04T13:38:31.9799182Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9799225Z     method(*args, **kwargs)
2025-12-04T13:38:31.9799390Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9799431Z     with policy():
2025-12-04T13:38:31.9799622Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9799668Z     raise RuntimeError(msg)
2025-12-04T13:38:31.9800061Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 3. CUDA driver allocated memory was 2250244096 and is now 3026190336.
2025-12-04T13:38:31.9800064Z 
2025-12-04T13:38:31.9800139Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9800387Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda
2025-12-04T13:38:31.9800390Z 
2025-12-04T13:38:31.9800476Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9800479Z 
2025-12-04T13:38:31.9800480Z 
2025-12-04T13:38:31.9800561Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:31.9800650Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:31.9800889Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-147ebbffa2c93fc5.xml -
2025-12-04T13:38:31.9800952Z =========================== short test summary info ============================
2025-12-04T13:38:31.9801219Z FAILED [7.8155s] distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_False_mixed_precision_True_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T13:38:31.9801270Z Traceback (most recent call last):
2025-12-04T13:38:31.9801434Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9801482Z     getattr(self, test_name)()
2025-12-04T13:38:31.9801646Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9801685Z     fn()
2025-12-04T13:38:31.9801852Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9801897Z     method(*args, **kwargs)
2025-12-04T13:38:31.9802050Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9802093Z     method(*args, **kwargs)
2025-12-04T13:38:31.9802246Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9802288Z     with policy():
2025-12-04T13:38:31.9802441Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9802487Z     raise RuntimeError(msg)
2025-12-04T13:38:31.9802872Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 1. CUDA driver allocated memory was 2317352960 and is now 3093299200.
2025-12-04T13:38:31.9802876Z 
2025-12-04T13:38:31.9802955Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9803205Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda
2025-12-04T13:38:31.9803207Z 
2025-12-04T13:38:31.9803294Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9803311Z 
2025-12-04T13:38:31.9803374Z Process 2 exited with error code 10 and exception:
2025-12-04T13:38:31.9803419Z Traceback (most recent call last):
2025-12-04T13:38:31.9803585Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9803640Z     getattr(self, test_name)()
2025-12-04T13:38:31.9803804Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9803839Z     fn()
2025-12-04T13:38:31.9803995Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9804035Z     method(*args, **kwargs)
2025-12-04T13:38:31.9804188Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9804229Z     method(*args, **kwargs)
2025-12-04T13:38:31.9804384Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9804421Z     with policy():
2025-12-04T13:38:31.9804578Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9804622Z     raise RuntimeError(msg)
2025-12-04T13:38:31.9804994Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 2. CUDA driver allocated memory was 2300575744 and is now 3076521984.
2025-12-04T13:38:31.9804996Z 
2025-12-04T13:38:31.9805074Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9805318Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda
2025-12-04T13:38:31.9805321Z 
2025-12-04T13:38:31.9805413Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9805415Z 
2025-12-04T13:38:31.9805473Z Process 3 exited with error code 10 and exception:
2025-12-04T13:38:31.9805523Z Traceback (most recent call last):
2025-12-04T13:38:31.9805699Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9805745Z     getattr(self, test_name)()
2025-12-04T13:38:31.9805904Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9805943Z     fn()
2025-12-04T13:38:31.9806095Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9806139Z     method(*args, **kwargs)
2025-12-04T13:38:31.9806290Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9806333Z     method(*args, **kwargs)
2025-12-04T13:38:31.9806483Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9806536Z     with policy():
2025-12-04T13:38:31.9806690Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9806736Z     raise RuntimeError(msg)
2025-12-04T13:38:31.9807106Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 3. CUDA driver allocated memory was 2250244096 and is now 3026190336.
2025-12-04T13:38:31.9807109Z 
2025-12-04T13:38:31.9807193Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9807439Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda
2025-12-04T13:38:31.9807441Z 
2025-12-04T13:38:31.9807528Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9807610Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:31.9807674Z ======================= 1 failed, 2 deselected in 7.98s ========================
2025-12-04T13:38:31.9807716Z Got exit code 1
2025-12-04T13:38:31.9807757Z Retrying single test...
2025-12-04T13:38:31.9807952Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-206ea0eb46205b47.xml
2025-12-04T13:38:31.9808011Z ============================= test session starts ==============================
2025-12-04T13:38:31.9808131Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:31.9808173Z cachedir: .pytest_cache
2025-12-04T13:38:31.9808337Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:31.9808385Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:31.9808431Z configfile: pytest.ini
2025-12-04T13:38:31.9808601Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:31.9808675Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:31.9808927Z stepcurrent: skipping 2 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_False_mixed_precision_True_cuda
2025-12-04T13:38:31.9808972Z Running 1 items in this shard
2025-12-04T13:38:31.9808975Z 
2025-12-04T13:38:31.9809307Z distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_False_mixed_precision_True_cuda I1204 12:58:11.565000 380165 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 380234
2025-12-04T13:38:31.9809463Z I1204 12:58:11.566000 380165 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 380235
2025-12-04T13:38:31.9809774Z I1204 12:58:11.566000 380165 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 380236
2025-12-04T13:38:31.9809925Z I1204 12:58:11.567000 380165 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 380237
2025-12-04T13:38:31.9810288Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:31.9810343Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:31.9810707Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:31.9810759Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:31.9811052Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type:
2025-12-04T13:38:31.9811121Z {<class 'torch.nn.modules.batchnorm.BatchNorm1d'>}
2025-12-04T13:38:31.9811226Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled.
2025-12-04T13:38:31.9811303Z   _warn_on_overridden_mixed_precision(overridden_module_classes)
2025-12-04T13:38:31.9811812Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:31.9811891Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:31.9812182Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type:
2025-12-04T13:38:31.9812246Z {<class 'torch.nn.modules.batchnorm.BatchNorm1d'>}
2025-12-04T13:38:31.9812354Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled.
2025-12-04T13:38:31.9812426Z   _warn_on_overridden_mixed_precision(overridden_module_classes)
2025-12-04T13:38:31.9812921Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:31.9812983Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:31.9813340Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:31.9813385Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:31.9813738Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:31.9813786Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:31.9814069Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type:
2025-12-04T13:38:31.9814144Z {<class 'torch.nn.modules.batchnorm.BatchNorm1d'>}
2025-12-04T13:38:31.9814245Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled.
2025-12-04T13:38:31.9814317Z   _warn_on_overridden_mixed_precision(overridden_module_classes)
2025-12-04T13:38:31.9814806Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:31.9814869Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:31.9815166Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type:
2025-12-04T13:38:31.9815230Z {<class 'torch.nn.modules.batchnorm.BatchNorm1d'>}
2025-12-04T13:38:31.9815329Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled.
2025-12-04T13:38:31.9815403Z   _warn_on_overridden_mixed_precision(overridden_module_classes)
2025-12-04T13:38:31.9815896Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:31.9815965Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:31.9816116Z [rank2]:E1204 12:58:17.636000 380236 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:31.9816292Z [rank2]:E1204 12:58:17.636000 380236 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:31.9816589Z [rank2]:E1204 12:58:17.636000 380236 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9816747Z [rank2]:E1204 12:58:17.636000 380236 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:31.9817040Z [rank2]:E1204 12:58:17.636000 380236 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9817169Z [rank2]:E1204 12:58:17.636000 380236 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:31.9817448Z [rank2]:E1204 12:58:17.636000 380236 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9817604Z [rank2]:E1204 12:58:17.636000 380236 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9817883Z [rank2]:E1204 12:58:17.636000 380236 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9818037Z [rank2]:E1204 12:58:17.636000 380236 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9818316Z [rank2]:E1204 12:58:17.636000 380236 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9818475Z [rank2]:E1204 12:58:17.636000 380236 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:31.9818753Z [rank2]:E1204 12:58:17.636000 380236 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9818905Z [rank2]:E1204 12:58:17.636000 380236 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:31.9819413Z [rank2]:E1204 12:58:17.636000 380236 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 2. CUDA driver allocated memory was 2300575744 and is now 3076521984.
2025-12-04T13:38:31.9819529Z [rank2]:E1204 12:58:17.636000 380236 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9819776Z [rank2]:E1204 12:58:17.636000 380236 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9820153Z [rank2]:E1204 12:58:17.636000 380236 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda
2025-12-04T13:38:31.9820284Z [rank2]:E1204 12:58:17.636000 380236 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9820498Z [rank2]:E1204 12:58:17.636000 380236 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9820677Z [rank2]:E1204 12:58:17.636000 380236 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:31.9820717Z dist init r=2, world=4
2025-12-04T13:38:31.9820855Z [rank0]:E1204 12:58:17.695000 380234 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:31.9821019Z [rank0]:E1204 12:58:17.695000 380234 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:31.9821308Z [rank0]:E1204 12:58:17.695000 380234 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9821465Z [rank0]:E1204 12:58:17.695000 380234 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:31.9821754Z [rank0]:E1204 12:58:17.695000 380234 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9821883Z [rank0]:E1204 12:58:17.695000 380234 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:31.9822165Z [rank0]:E1204 12:58:17.695000 380234 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9822313Z [rank0]:E1204 12:58:17.695000 380234 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9822593Z [rank0]:E1204 12:58:17.695000 380234 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9822756Z [rank0]:E1204 12:58:17.695000 380234 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9823036Z [rank0]:E1204 12:58:17.695000 380234 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9823173Z [rank0]:E1204 12:58:17.695000 380234 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:31.9823455Z [rank0]:E1204 12:58:17.695000 380234 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9823622Z [rank0]:E1204 12:58:17.695000 380234 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:31.9824116Z [rank0]:E1204 12:58:17.695000 380234 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 0. CUDA driver allocated memory was 2453667840 and is now 3229614080.
2025-12-04T13:38:31.9824237Z [rank0]:E1204 12:58:17.695000 380234 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9824434Z [rank0]:E1204 12:58:17.695000 380234 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9824818Z [rank0]:E1204 12:58:17.695000 380234 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda
2025-12-04T13:38:31.9824941Z [rank0]:E1204 12:58:17.695000 380234 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9825153Z [rank0]:E1204 12:58:17.695000 380234 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9825319Z [rank0]:E1204 12:58:17.695000 380234 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:31.9825360Z dist init r=0, world=4
2025-12-04T13:38:31.9825500Z [rank1]:E1204 12:58:17.719000 380235 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:31.9825658Z [rank1]:E1204 12:58:17.719000 380235 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:31.9825946Z [rank1]:E1204 12:58:17.719000 380235 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9826100Z [rank1]:E1204 12:58:17.719000 380235 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:31.9826388Z [rank1]:E1204 12:58:17.719000 380235 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9826513Z [rank1]:E1204 12:58:17.719000 380235 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:31.9826794Z [rank1]:E1204 12:58:17.719000 380235 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9826959Z [rank1]:E1204 12:58:17.719000 380235 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9827237Z [rank1]:E1204 12:58:17.719000 380235 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9827386Z [rank1]:E1204 12:58:17.719000 380235 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9827663Z [rank1]:E1204 12:58:17.719000 380235 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9827804Z [rank1]:E1204 12:58:17.719000 380235 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:31.9828097Z [rank1]:E1204 12:58:17.719000 380235 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9828250Z [rank1]:E1204 12:58:17.719000 380235 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:31.9828739Z [rank1]:E1204 12:58:17.719000 380235 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 1. CUDA driver allocated memory was 2317352960 and is now 3093299200.
2025-12-04T13:38:31.9828862Z [rank1]:E1204 12:58:17.719000 380235 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9829058Z [rank1]:E1204 12:58:17.719000 380235 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9829442Z [rank1]:E1204 12:58:17.719000 380235 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda
2025-12-04T13:38:31.9829557Z [rank1]:E1204 12:58:17.719000 380235 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9829804Z [rank1]:E1204 12:58:17.719000 380235 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9829969Z [rank1]:E1204 12:58:17.719000 380235 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:31.9830011Z dist init r=1, world=4
2025-12-04T13:38:31.9830149Z [rank3]:E1204 12:58:17.727000 380237 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:31.9830312Z [rank3]:E1204 12:58:17.727000 380237 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:31.9830599Z [rank3]:E1204 12:58:17.727000 380237 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9830757Z [rank3]:E1204 12:58:17.727000 380237 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:31.9831043Z [rank3]:E1204 12:58:17.727000 380237 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9831171Z [rank3]:E1204 12:58:17.727000 380237 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:31.9831465Z [rank3]:E1204 12:58:17.727000 380237 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9831617Z [rank3]:E1204 12:58:17.727000 380237 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9831894Z [rank3]:E1204 12:58:17.727000 380237 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9832040Z [rank3]:E1204 12:58:17.727000 380237 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9832329Z [rank3]:E1204 12:58:17.727000 380237 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9832465Z [rank3]:E1204 12:58:17.727000 380237 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:31.9832750Z [rank3]:E1204 12:58:17.727000 380237 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9832898Z [rank3]:E1204 12:58:17.727000 380237 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:31.9833405Z [rank3]:E1204 12:58:17.727000 380237 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 3. CUDA driver allocated memory was 2250244096 and is now 3026190336.
2025-12-04T13:38:31.9833543Z [rank3]:E1204 12:58:17.727000 380237 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9833738Z [rank3]:E1204 12:58:17.727000 380237 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9834114Z [rank3]:E1204 12:58:17.727000 380237 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda
2025-12-04T13:38:31.9834227Z [rank3]:E1204 12:58:17.727000 380237 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9834444Z [rank3]:E1204 12:58:17.727000 380237 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9834609Z [rank3]:E1204 12:58:17.727000 380237 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:31.9834654Z dist init r=3, world=4
2025-12-04T13:38:31.9834997Z [rank0]:[W1204 12:58:17.963404852 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:31.9835038Z FAILED [7.8186s] [100%]
2025-12-04T13:38:31.9835041Z 
2025-12-04T13:38:31.9835102Z =================================== FAILURES ===================================
2025-12-04T13:38:31.9835218Z _ TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda _
2025-12-04T13:38:31.9835269Z Traceback (most recent call last):
2025-12-04T13:38:31.9835433Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:31.9835493Z     self._join_processes(fn)
2025-12-04T13:38:31.9835667Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:31.9835727Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:31.9835905Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:31.9835954Z     raise RuntimeError(error)
2025-12-04T13:38:31.9836035Z RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T13:38:31.9836085Z Traceback (most recent call last):
2025-12-04T13:38:31.9836247Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9836295Z     getattr(self, test_name)()
2025-12-04T13:38:31.9836466Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9836506Z     fn()
2025-12-04T13:38:31.9836657Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9836703Z     method(*args, **kwargs)
2025-12-04T13:38:31.9836861Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9836903Z     method(*args, **kwargs)
2025-12-04T13:38:31.9837069Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9837108Z     with policy():
2025-12-04T13:38:31.9837266Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9837308Z     raise RuntimeError(msg)
2025-12-04T13:38:31.9837690Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 2. CUDA driver allocated memory was 2300575744 and is now 3076521984.
2025-12-04T13:38:31.9837692Z 
2025-12-04T13:38:31.9837767Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9862784Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda
2025-12-04T13:38:31.9862796Z 
2025-12-04T13:38:31.9862910Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9862912Z 
2025-12-04T13:38:31.9862913Z 
2025-12-04T13:38:31.9862996Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:31.9863091Z Process 2 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:31.9863331Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-206ea0eb46205b47.xml -
2025-12-04T13:38:31.9863396Z =========================== short test summary info ============================
2025-12-04T13:38:31.9863661Z FAILED [7.8186s] distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_False_mixed_precision_True_cuda - RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T13:38:31.9863713Z Traceback (most recent call last):
2025-12-04T13:38:31.9863882Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9863929Z     getattr(self, test_name)()
2025-12-04T13:38:31.9864091Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9864131Z     fn()
2025-12-04T13:38:31.9864324Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9864367Z     method(*args, **kwargs)
2025-12-04T13:38:31.9864520Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9864562Z     method(*args, **kwargs)
2025-12-04T13:38:31.9864713Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9864754Z     with policy():
2025-12-04T13:38:31.9864905Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9864948Z     raise RuntimeError(msg)
2025-12-04T13:38:31.9865332Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 2. CUDA driver allocated memory was 2300575744 and is now 3076521984.
2025-12-04T13:38:31.9865338Z 
2025-12-04T13:38:31.9865414Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9865662Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda
2025-12-04T13:38:31.9865664Z 
2025-12-04T13:38:31.9865766Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9865832Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:31.9865894Z ======================= 1 failed, 32 deselected in 7.98s =======================
2025-12-04T13:38:31.9865933Z Got exit code 1
2025-12-04T13:38:31.9865973Z Retrying single test...
2025-12-04T13:38:31.9866182Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9a27cf1bb561c9e5.xml
2025-12-04T13:38:31.9866242Z ============================= test session starts ==============================
2025-12-04T13:38:31.9866360Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:31.9866400Z cachedir: .pytest_cache
2025-12-04T13:38:31.9866565Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:31.9866613Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:31.9866658Z configfile: pytest.ini
2025-12-04T13:38:31.9866823Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:31.9866899Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:31.9867144Z stepcurrent: skipping 2 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_False_mixed_precision_True_cuda
2025-12-04T13:38:31.9867191Z Running 1 items in this shard
2025-12-04T13:38:31.9867193Z 
2025-12-04T13:38:31.9867517Z distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_False_mixed_precision_True_cuda I1204 12:58:22.041000 380551 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 380620
2025-12-04T13:38:31.9867673Z I1204 12:58:22.042000 380551 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 380621
2025-12-04T13:38:31.9867827Z I1204 12:58:22.042000 380551 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 380622
2025-12-04T13:38:31.9867977Z I1204 12:58:22.043000 380551 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 380623
2025-12-04T13:38:31.9868353Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:31.9868402Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:31.9868693Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type:
2025-12-04T13:38:31.9868761Z {<class 'torch.nn.modules.batchnorm.BatchNorm1d'>}
2025-12-04T13:38:31.9868865Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled.
2025-12-04T13:38:31.9868942Z   _warn_on_overridden_mixed_precision(overridden_module_classes)
2025-12-04T13:38:31.9869450Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:31.9869515Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:31.9869907Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:31.9869973Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:31.9870260Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type:
2025-12-04T13:38:31.9870338Z {<class 'torch.nn.modules.batchnorm.BatchNorm1d'>}
2025-12-04T13:38:31.9870444Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled.
2025-12-04T13:38:31.9870518Z   _warn_on_overridden_mixed_precision(overridden_module_classes)
2025-12-04T13:38:31.9871008Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:31.9871069Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:31.9871427Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:31.9871474Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:31.9871763Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type:
2025-12-04T13:38:31.9871824Z {<class 'torch.nn.modules.batchnorm.BatchNorm1d'>}
2025-12-04T13:38:31.9871929Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled.
2025-12-04T13:38:31.9872000Z   _warn_on_overridden_mixed_precision(overridden_module_classes)
2025-12-04T13:38:31.9872494Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:31.9872570Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:31.9872922Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:31.9872971Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:31.9873254Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type:
2025-12-04T13:38:31.9873318Z {<class 'torch.nn.modules.batchnorm.BatchNorm1d'>}
2025-12-04T13:38:31.9873417Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled.
2025-12-04T13:38:31.9873505Z   _warn_on_overridden_mixed_precision(overridden_module_classes)
2025-12-04T13:38:31.9873991Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:31.9874053Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:31.9874219Z [rank3]:E1204 12:58:28.172000 380623 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:31.9874382Z [rank3]:E1204 12:58:28.172000 380623 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:31.9874678Z [rank3]:E1204 12:58:28.172000 380623 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9874844Z [rank3]:E1204 12:58:28.172000 380623 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:31.9875133Z [rank3]:E1204 12:58:28.172000 380623 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9875258Z [rank3]:E1204 12:58:28.172000 380623 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:31.9875540Z [rank3]:E1204 12:58:28.172000 380623 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9875692Z [rank3]:E1204 12:58:28.172000 380623 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9875969Z [rank3]:E1204 12:58:28.172000 380623 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9876118Z [rank3]:E1204 12:58:28.172000 380623 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9876392Z [rank3]:E1204 12:58:28.172000 380623 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9876531Z [rank3]:E1204 12:58:28.172000 380623 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:31.9876822Z [rank3]:E1204 12:58:28.172000 380623 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9876974Z [rank3]:E1204 12:58:28.172000 380623 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:31.9877467Z [rank3]:E1204 12:58:28.172000 380623 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 3. CUDA driver allocated memory was 2250244096 and is now 3026190336.
2025-12-04T13:38:31.9877583Z [rank3]:E1204 12:58:28.172000 380623 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9877790Z [rank3]:E1204 12:58:28.172000 380623 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9878170Z [rank3]:E1204 12:58:28.172000 380623 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda
2025-12-04T13:38:31.9878286Z [rank3]:E1204 12:58:28.172000 380623 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9878498Z [rank3]:E1204 12:58:28.172000 380623 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9878680Z [rank3]:E1204 12:58:28.172000 380623 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:31.9878722Z dist init r=3, world=4
2025-12-04T13:38:31.9878860Z [rank1]:E1204 12:58:28.176000 380621 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:31.9879034Z [rank1]:E1204 12:58:28.176000 380621 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:31.9879321Z [rank1]:E1204 12:58:28.176000 380621 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9879476Z [rank1]:E1204 12:58:28.176000 380621 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:31.9879794Z [rank1]:E1204 12:58:28.176000 380621 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9879922Z [rank1]:E1204 12:58:28.176000 380621 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:31.9880201Z [rank1]:E1204 12:58:28.176000 380621 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9880351Z [rank1]:E1204 12:58:28.176000 380621 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9880630Z [rank1]:E1204 12:58:28.176000 380621 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9880777Z [rank1]:E1204 12:58:28.176000 380621 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9881056Z [rank1]:E1204 12:58:28.176000 380621 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9881205Z [rank1]:E1204 12:58:28.176000 380621 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:31.9881484Z [rank1]:E1204 12:58:28.176000 380621 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9881631Z [rank1]:E1204 12:58:28.176000 380621 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:31.9882135Z [rank1]:E1204 12:58:28.176000 380621 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 1. CUDA driver allocated memory was 2317352960 and is now 3093299200.
2025-12-04T13:38:31.9882253Z [rank1]:E1204 12:58:28.176000 380621 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9882449Z [rank1]:E1204 12:58:28.176000 380621 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9882824Z [rank1]:E1204 12:58:28.176000 380621 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda
2025-12-04T13:38:31.9882949Z [rank1]:E1204 12:58:28.176000 380621 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9883162Z [rank1]:E1204 12:58:28.176000 380621 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9883340Z [rank1]:E1204 12:58:28.176000 380621 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:31.9883381Z dist init r=1, world=4
2025-12-04T13:38:31.9883521Z [rank0]:E1204 12:58:28.217000 380620 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:31.9883680Z [rank0]:E1204 12:58:28.217000 380620 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:31.9883968Z [rank0]:E1204 12:58:28.217000 380620 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9884121Z [rank0]:E1204 12:58:28.217000 380620 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:31.9884407Z [rank0]:E1204 12:58:28.217000 380620 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9884530Z [rank0]:E1204 12:58:28.217000 380620 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:31.9884808Z [rank0]:E1204 12:58:28.217000 380620 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9884955Z [rank0]:E1204 12:58:28.217000 380620 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9885232Z [rank0]:E1204 12:58:28.217000 380620 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9885392Z [rank0]:E1204 12:58:28.217000 380620 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9885668Z [rank0]:E1204 12:58:28.217000 380620 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9885804Z [rank0]:E1204 12:58:28.217000 380620 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:31.9886082Z [rank0]:E1204 12:58:28.217000 380620 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9886235Z [rank0]:E1204 12:58:28.217000 380620 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:31.9886746Z [rank0]:E1204 12:58:28.217000 380620 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 0. CUDA driver allocated memory was 2453667840 and is now 3229614080.
2025-12-04T13:38:31.9886861Z [rank0]:E1204 12:58:28.217000 380620 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9887058Z [rank0]:E1204 12:58:28.217000 380620 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9887441Z [rank0]:E1204 12:58:28.217000 380620 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda
2025-12-04T13:38:31.9887567Z [rank0]:E1204 12:58:28.217000 380620 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9887776Z [rank0]:E1204 12:58:28.217000 380620 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9887942Z [rank0]:E1204 12:58:28.217000 380620 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:31.9887980Z dist init r=0, world=4
2025-12-04T13:38:31.9888121Z [rank2]:E1204 12:58:28.230000 380622 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:31.9888285Z [rank2]:E1204 12:58:28.230000 380622 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:31.9888577Z [rank2]:E1204 12:58:28.230000 380622 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9888733Z [rank2]:E1204 12:58:28.230000 380622 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:31.9889017Z [rank2]:E1204 12:58:28.230000 380622 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9889141Z [rank2]:E1204 12:58:28.230000 380622 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:31.9889419Z [rank2]:E1204 12:58:28.230000 380622 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9889610Z [rank2]:E1204 12:58:28.230000 380622 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9889925Z [rank2]:E1204 12:58:28.230000 380622 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9890071Z [rank2]:E1204 12:58:28.230000 380622 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9890349Z [rank2]:E1204 12:58:28.230000 380622 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9890486Z [rank2]:E1204 12:58:28.230000 380622 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:31.9890791Z [rank2]:E1204 12:58:28.230000 380622 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9890940Z [rank2]:E1204 12:58:28.230000 380622 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:31.9891438Z [rank2]:E1204 12:58:28.230000 380622 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 2. CUDA driver allocated memory was 2300575744 and is now 3076521984.
2025-12-04T13:38:31.9891571Z [rank2]:E1204 12:58:28.230000 380622 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9891765Z [rank2]:E1204 12:58:28.230000 380622 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9892149Z [rank2]:E1204 12:58:28.230000 380622 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda
2025-12-04T13:38:31.9892261Z [rank2]:E1204 12:58:28.230000 380622 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9892473Z [rank2]:E1204 12:58:28.230000 380622 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9892636Z [rank2]:E1204 12:58:28.230000 380622 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:31.9892677Z dist init r=2, world=4
2025-12-04T13:38:31.9893016Z [rank0]:[W1204 12:58:28.495452182 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:31.9893058Z FAILED [7.8180s] [100%]
2025-12-04T13:38:31.9893060Z 
2025-12-04T13:38:31.9893120Z =================================== FAILURES ===================================
2025-12-04T13:38:31.9893234Z _ TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda _
2025-12-04T13:38:31.9893281Z Traceback (most recent call last):
2025-12-04T13:38:31.9893446Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:31.9893492Z     self._join_processes(fn)
2025-12-04T13:38:31.9893664Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:31.9893722Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:31.9893909Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:31.9893955Z     raise RuntimeError(error)
2025-12-04T13:38:31.9894035Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T13:38:31.9894081Z Traceback (most recent call last):
2025-12-04T13:38:31.9894243Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9894289Z     getattr(self, test_name)()
2025-12-04T13:38:31.9894446Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9894482Z     fn()
2025-12-04T13:38:31.9894634Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9894689Z     method(*args, **kwargs)
2025-12-04T13:38:31.9894841Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9894885Z     method(*args, **kwargs)
2025-12-04T13:38:31.9895037Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9895077Z     with policy():
2025-12-04T13:38:31.9897008Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9897069Z     raise RuntimeError(msg)
2025-12-04T13:38:31.9897438Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 3. CUDA driver allocated memory was 2250244096 and is now 3026190336.
2025-12-04T13:38:31.9897460Z 
2025-12-04T13:38:31.9897535Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9897785Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda
2025-12-04T13:38:31.9897788Z 
2025-12-04T13:38:31.9897875Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9897878Z 
2025-12-04T13:38:31.9897879Z 
2025-12-04T13:38:31.9897957Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:31.9898045Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:31.9898282Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9a27cf1bb561c9e5.xml -
2025-12-04T13:38:31.9898346Z =========================== short test summary info ============================
2025-12-04T13:38:31.9898610Z FAILED [7.8180s] distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_False_mixed_precision_True_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T13:38:31.9898660Z Traceback (most recent call last):
2025-12-04T13:38:31.9898825Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9898870Z     getattr(self, test_name)()
2025-12-04T13:38:31.9899031Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9899069Z     fn()
2025-12-04T13:38:31.9899221Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9899265Z     method(*args, **kwargs)
2025-12-04T13:38:31.9899417Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9899460Z     method(*args, **kwargs)
2025-12-04T13:38:31.9899658Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9899699Z     with policy():
2025-12-04T13:38:31.9899851Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9899894Z     raise RuntimeError(msg)
2025-12-04T13:38:31.9900261Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 3. CUDA driver allocated memory was 2250244096 and is now 3026190336.
2025-12-04T13:38:31.9900266Z 
2025-12-04T13:38:31.9900341Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9900609Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_False_mixed_precision_True_cuda
2025-12-04T13:38:31.9900611Z 
2025-12-04T13:38:31.9900697Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9900763Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:31.9900826Z ======================= 1 failed, 32 deselected in 7.96s =======================
2025-12-04T13:38:31.9900868Z Got exit code 1
2025-12-04T13:38:31.9901080Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_False_mixed_precision_True_cuda
2025-12-04T13:38:31.9901210Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T13:38:31.9901401Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-daad1a9afcbc47ee.xml
2025-12-04T13:38:31.9901479Z ============================= test session starts ==============================
2025-12-04T13:38:31.9901592Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:31.9901635Z cachedir: .pytest_cache
2025-12-04T13:38:31.9901794Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:31.9901843Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:31.9901883Z configfile: pytest.ini
2025-12-04T13:38:31.9902051Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:31.9902127Z collecting ... collected 60 items / 3 deselected / 57 selected
2025-12-04T13:38:31.9902180Z stepcurrent: skipping 3 already run items.
2025-12-04T13:38:31.9902224Z Running 30 items in this shard
2025-12-04T13:38:31.9902228Z 
2025-12-04T13:38:31.9902553Z distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_False_cuda I1204 12:58:32.635000 380937 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 381006
2025-12-04T13:38:31.9902711Z I1204 12:58:32.635000 380937 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 381007
2025-12-04T13:38:31.9902862Z I1204 12:58:32.636000 380937 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 381008
2025-12-04T13:38:31.9903017Z I1204 12:58:32.636000 380937 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 381009
2025-12-04T13:38:31.9903378Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:31.9903430Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:31.9903794Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:31.9903840Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:31.9904330Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:31.9904394Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:31.9904895Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:31.9904957Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:31.9905313Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:31.9905370Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:31.9905855Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:31.9905928Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:31.9906279Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:31.9906328Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:31.9906812Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:31.9906874Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:31.9907019Z [rank0]:E1204 12:58:38.657000 381006 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:31.9907179Z [rank0]:E1204 12:58:38.657000 381006 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:31.9907477Z [rank0]:E1204 12:58:38.657000 381006 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9907633Z [rank0]:E1204 12:58:38.657000 381006 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:31.9907934Z [rank0]:E1204 12:58:38.657000 381006 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9908059Z [rank0]:E1204 12:58:38.657000 381006 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:31.9908339Z [rank0]:E1204 12:58:38.657000 381006 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9908488Z [rank0]:E1204 12:58:38.657000 381006 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9908765Z [rank0]:E1204 12:58:38.657000 381006 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9908926Z [rank0]:E1204 12:58:38.657000 381006 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9909201Z [rank0]:E1204 12:58:38.657000 381006 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9909340Z [rank0]:E1204 12:58:38.657000 381006 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:31.9909655Z [rank0]:E1204 12:58:38.657000 381006 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9909818Z [rank0]:E1204 12:58:38.657000 381006 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:31.9910312Z [rank0]:E1204 12:58:38.657000 381006 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 2453667840 and is now 3229614080.
2025-12-04T13:38:31.9910440Z [rank0]:E1204 12:58:38.657000 381006 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9910636Z [rank0]:E1204 12:58:38.657000 381006 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9911012Z [rank0]:E1204 12:58:38.657000 381006 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda
2025-12-04T13:38:31.9911129Z [rank0]:E1204 12:58:38.657000 381006 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9911340Z [rank0]:E1204 12:58:38.657000 381006 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9911509Z [rank0]:E1204 12:58:38.657000 381006 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:31.9911549Z dist init r=0, world=4
2025-12-04T13:38:31.9911685Z [rank3]:E1204 12:58:38.666000 381009 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:31.9911849Z [rank3]:E1204 12:58:38.666000 381009 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:31.9912136Z [rank3]:E1204 12:58:38.666000 381009 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9912304Z [rank3]:E1204 12:58:38.666000 381009 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:31.9912587Z [rank3]:E1204 12:58:38.666000 381009 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9912712Z [rank3]:E1204 12:58:38.666000 381009 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:31.9912988Z [rank3]:E1204 12:58:38.666000 381009 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9913150Z [rank3]:E1204 12:58:38.666000 381009 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9913430Z [rank3]:E1204 12:58:38.666000 381009 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9913576Z [rank3]:E1204 12:58:38.666000 381009 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9913858Z [rank3]:E1204 12:58:38.666000 381009 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9914010Z [rank3]:E1204 12:58:38.666000 381009 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:31.9914290Z [rank3]:E1204 12:58:38.666000 381009 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9914454Z [rank3]:E1204 12:58:38.666000 381009 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:31.9914945Z [rank3]:E1204 12:58:38.666000 381009 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 2250244096 and is now 3026190336.
2025-12-04T13:38:31.9915062Z [rank3]:E1204 12:58:38.666000 381009 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9915257Z [rank3]:E1204 12:58:38.666000 381009 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9915632Z [rank3]:E1204 12:58:38.666000 381009 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda
2025-12-04T13:38:31.9915744Z [rank3]:E1204 12:58:38.666000 381009 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9915960Z [rank3]:E1204 12:58:38.666000 381009 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9916125Z [rank3]:E1204 12:58:38.666000 381009 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:31.9916167Z dist init r=3, world=4
2025-12-04T13:38:31.9916305Z [rank2]:E1204 12:58:38.704000 381008 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:31.9916478Z [rank2]:E1204 12:58:38.704000 381008 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:31.9916768Z [rank2]:E1204 12:58:38.704000 381008 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9916921Z [rank2]:E1204 12:58:38.704000 381008 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:31.9917206Z [rank2]:E1204 12:58:38.704000 381008 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9917328Z [rank2]:E1204 12:58:38.704000 381008 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:31.9917615Z [rank2]:E1204 12:58:38.704000 381008 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9917762Z [rank2]:E1204 12:58:38.704000 381008 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9918038Z [rank2]:E1204 12:58:38.704000 381008 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9918196Z [rank2]:E1204 12:58:38.704000 381008 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9918469Z [rank2]:E1204 12:58:38.704000 381008 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9918619Z [rank2]:E1204 12:58:38.704000 381008 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:31.9918896Z [rank2]:E1204 12:58:38.704000 381008 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9919046Z [rank2]:E1204 12:58:38.704000 381008 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:31.9919534Z [rank2]:E1204 12:58:38.704000 381008 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 2300575744 and is now 3076521984.
2025-12-04T13:38:31.9919693Z [rank2]:E1204 12:58:38.704000 381008 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9919889Z [rank2]:E1204 12:58:38.704000 381008 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9920257Z [rank2]:E1204 12:58:38.704000 381008 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda
2025-12-04T13:38:31.9920373Z [rank2]:E1204 12:58:38.704000 381008 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9920585Z [rank2]:E1204 12:58:38.704000 381008 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9920763Z [rank2]:E1204 12:58:38.704000 381008 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:31.9920802Z dist init r=2, world=4
2025-12-04T13:38:31.9920945Z [rank1]:E1204 12:58:38.731000 381007 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:31.9921115Z [rank1]:E1204 12:58:38.731000 381007 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:31.9921402Z [rank1]:E1204 12:58:38.731000 381007 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9921562Z [rank1]:E1204 12:58:38.731000 381007 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:31.9921869Z [rank1]:E1204 12:58:38.731000 381007 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9921994Z [rank1]:E1204 12:58:38.731000 381007 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:31.9922279Z [rank1]:E1204 12:58:38.731000 381007 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9922444Z [rank1]:E1204 12:58:38.731000 381007 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9922726Z [rank1]:E1204 12:58:38.731000 381007 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9922888Z [rank1]:E1204 12:58:38.731000 381007 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9923167Z [rank1]:E1204 12:58:38.731000 381007 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9923303Z [rank1]:E1204 12:58:38.731000 381007 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:31.9923586Z [rank1]:E1204 12:58:38.731000 381007 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9923740Z [rank1]:E1204 12:58:38.731000 381007 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:31.9924234Z [rank1]:E1204 12:58:38.731000 381007 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 2317352960 and is now 3093299200.
2025-12-04T13:38:31.9924356Z [rank1]:E1204 12:58:38.731000 381007 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9924552Z [rank1]:E1204 12:58:38.731000 381007 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9924931Z [rank1]:E1204 12:58:38.731000 381007 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda
2025-12-04T13:38:31.9925059Z [rank1]:E1204 12:58:38.731000 381007 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9925271Z [rank1]:E1204 12:58:38.731000 381007 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9925443Z [rank1]:E1204 12:58:38.731000 381007 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:31.9925484Z dist init r=1, world=4
2025-12-04T13:38:31.9925831Z [rank0]:[W1204 12:58:38.836695219 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:31.9925875Z FAILED [7.7165s] [  3%]
2025-12-04T13:38:31.9925877Z 
2025-12-04T13:38:31.9925951Z =================================== FAILURES ===================================
2025-12-04T13:38:31.9926070Z _ TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda _
2025-12-04T13:38:31.9926125Z Traceback (most recent call last):
2025-12-04T13:38:31.9926291Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:31.9926344Z     self._join_processes(fn)
2025-12-04T13:38:31.9926520Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:31.9926593Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:31.9926777Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:31.9926825Z     raise RuntimeError(error)
2025-12-04T13:38:31.9926912Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:31.9926973Z Traceback (most recent call last):
2025-12-04T13:38:31.9927147Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9927194Z     getattr(self, test_name)()
2025-12-04T13:38:31.9927364Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9927404Z     fn()
2025-12-04T13:38:31.9927567Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9927614Z     method(*args, **kwargs)
2025-12-04T13:38:31.9927778Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9927823Z     method(*args, **kwargs)
2025-12-04T13:38:31.9927986Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9928028Z     with policy():
2025-12-04T13:38:31.9928193Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9928239Z     raise RuntimeError(msg)
2025-12-04T13:38:31.9928622Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 2453667840 and is now 3229614080.
2025-12-04T13:38:31.9928625Z 
2025-12-04T13:38:31.9928706Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9928963Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda
2025-12-04T13:38:31.9928965Z 
2025-12-04T13:38:31.9929066Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9929068Z 
2025-12-04T13:38:31.9929080Z 
2025-12-04T13:38:31.9929160Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:31.9929259Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:31.9929496Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-daad1a9afcbc47ee.xml -
2025-12-04T13:38:31.9929622Z =========================== short test summary info ============================
2025-12-04T13:38:31.9929889Z FAILED [7.7165s] distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_False_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:31.9929945Z Traceback (most recent call last):
2025-12-04T13:38:31.9930134Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9930188Z     getattr(self, test_name)()
2025-12-04T13:38:31.9930353Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9930399Z     fn()
2025-12-04T13:38:31.9930556Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9930608Z     method(*args, **kwargs)
2025-12-04T13:38:31.9930770Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9930828Z     method(*args, **kwargs)
2025-12-04T13:38:31.9930990Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9931033Z     with policy():
2025-12-04T13:38:31.9931199Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9931259Z     raise RuntimeError(msg)
2025-12-04T13:38:31.9931634Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 2453667840 and is now 3229614080.
2025-12-04T13:38:31.9931637Z 
2025-12-04T13:38:31.9931715Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9931970Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda
2025-12-04T13:38:31.9931974Z 
2025-12-04T13:38:31.9932065Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9932139Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:31.9932206Z ======================= 1 failed, 3 deselected in 7.88s ========================
2025-12-04T13:38:31.9932254Z Got exit code 1
2025-12-04T13:38:31.9932299Z Retrying single test...
2025-12-04T13:38:31.9932497Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-e061f9bfbb2801c5.xml
2025-12-04T13:38:31.9932566Z ============================= test session starts ==============================
2025-12-04T13:38:31.9932685Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:31.9932738Z cachedir: .pytest_cache
2025-12-04T13:38:31.9932900Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:31.9932956Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:31.9933001Z configfile: pytest.ini
2025-12-04T13:38:31.9933189Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:31.9933269Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:31.9933522Z stepcurrent: skipping 3 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_False_cuda
2025-12-04T13:38:31.9933570Z Running 1 items in this shard
2025-12-04T13:38:31.9933572Z 
2025-12-04T13:38:31.9933903Z distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_False_cuda I1204 12:58:42.854000 381323 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 381392
2025-12-04T13:38:31.9934063Z I1204 12:58:42.855000 381323 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 381393
2025-12-04T13:38:31.9934236Z I1204 12:58:42.855000 381323 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 381394
2025-12-04T13:38:31.9934392Z I1204 12:58:42.856000 381323 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 381395
2025-12-04T13:38:31.9934764Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:31.9934823Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:31.9935329Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:31.9935414Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:31.9935771Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:31.9935829Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:31.9936326Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:31.9936390Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:31.9936752Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:31.9936802Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:31.9937163Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:31.9937213Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:31.9937721Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:31.9937792Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:31.9938281Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:31.9938351Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:31.9938497Z [rank0]:E1204 12:58:48.839000 381392 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:31.9938680Z [rank0]:E1204 12:58:48.839000 381392 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:31.9938976Z [rank0]:E1204 12:58:48.839000 381392 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9939140Z [rank0]:E1204 12:58:48.839000 381392 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:31.9939436Z [rank0]:E1204 12:58:48.839000 381392 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9939614Z [rank0]:E1204 12:58:48.839000 381392 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:31.9939902Z [rank0]:E1204 12:58:48.839000 381392 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9940071Z [rank0]:E1204 12:58:48.839000 381392 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9940358Z [rank0]:E1204 12:58:48.839000 381392 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9940509Z [rank0]:E1204 12:58:48.839000 381392 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9940798Z [rank0]:E1204 12:58:48.839000 381392 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9940938Z [rank0]:E1204 12:58:48.839000 381392 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:31.9941232Z [rank0]:E1204 12:58:48.839000 381392 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9941392Z [rank0]:E1204 12:58:48.839000 381392 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:31.9941885Z [rank0]:E1204 12:58:48.839000 381392 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 2453667840 and is now 3229614080.
2025-12-04T13:38:31.9942013Z [rank0]:E1204 12:58:48.839000 381392 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9942226Z [rank0]:E1204 12:58:48.839000 381392 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9942797Z [rank0]:E1204 12:58:48.839000 381392 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda
2025-12-04T13:38:31.9942921Z [rank0]:E1204 12:58:48.839000 381392 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9943138Z [rank0]:E1204 12:58:48.839000 381392 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9943312Z [rank0]:E1204 12:58:48.839000 381392 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:31.9943372Z dist init r=0, world=4
2025-12-04T13:38:31.9943522Z [rank1]:E1204 12:58:48.892000 381393 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:31.9943685Z [rank1]:E1204 12:58:48.892000 381393 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:31.9943984Z [rank1]:E1204 12:58:48.892000 381393 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9944153Z [rank1]:E1204 12:58:48.892000 381393 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:31.9944447Z [rank1]:E1204 12:58:48.892000 381393 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9944593Z [rank1]:E1204 12:58:48.892000 381393 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:31.9944874Z [rank1]:E1204 12:58:48.892000 381393 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9945032Z [rank1]:E1204 12:58:48.892000 381393 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9945310Z [rank1]:E1204 12:58:48.892000 381393 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9945467Z [rank1]:E1204 12:58:48.892000 381393 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9945748Z [rank1]:E1204 12:58:48.892000 381393 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9945897Z [rank1]:E1204 12:58:48.892000 381393 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:31.9946191Z [rank1]:E1204 12:58:48.892000 381393 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9946343Z [rank1]:E1204 12:58:48.892000 381393 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:31.9946866Z [rank1]:E1204 12:58:48.892000 381393 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 2317352960 and is now 3093299200.
2025-12-04T13:38:31.9946985Z [rank1]:E1204 12:58:48.892000 381393 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9947191Z [rank1]:E1204 12:58:48.892000 381393 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9947565Z [rank1]:E1204 12:58:48.892000 381393 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda
2025-12-04T13:38:31.9947689Z [rank1]:E1204 12:58:48.892000 381393 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9947923Z [rank1]:E1204 12:58:48.892000 381393 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9948091Z [rank1]:E1204 12:58:48.892000 381393 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:31.9948144Z dist init r=1, world=4
2025-12-04T13:38:31.9948285Z [rank2]:E1204 12:58:48.914000 381394 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:31.9948455Z [rank2]:E1204 12:58:48.914000 381394 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:31.9948760Z [rank2]:E1204 12:58:48.914000 381394 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9948926Z [rank2]:E1204 12:58:48.914000 381394 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:31.9949226Z [rank2]:E1204 12:58:48.914000 381394 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9949359Z [rank2]:E1204 12:58:48.914000 381394 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:31.9949680Z [rank2]:E1204 12:58:48.914000 381394 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9949831Z [rank2]:E1204 12:58:48.914000 381394 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9950118Z [rank2]:E1204 12:58:48.914000 381394 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9950270Z [rank2]:E1204 12:58:48.914000 381394 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9950557Z [rank2]:E1204 12:58:48.914000 381394 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9950698Z [rank2]:E1204 12:58:48.914000 381394 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:31.9950988Z [rank2]:E1204 12:58:48.914000 381394 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9951149Z [rank2]:E1204 12:58:48.914000 381394 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:31.9951655Z [rank2]:E1204 12:58:48.914000 381394 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 2300575744 and is now 3076521984.
2025-12-04T13:38:31.9951780Z [rank2]:E1204 12:58:48.914000 381394 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9951979Z [rank2]:E1204 12:58:48.914000 381394 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9952376Z [rank2]:E1204 12:58:48.914000 381394 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda
2025-12-04T13:38:31.9952494Z [rank2]:E1204 12:58:48.914000 381394 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9952714Z [rank2]:E1204 12:58:48.914000 381394 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9952889Z [rank2]:E1204 12:58:48.914000 381394 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:31.9952949Z dist init r=2, world=4
2025-12-04T13:38:31.9953097Z [rank3]:E1204 12:58:48.925000 381395 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:31.9953261Z [rank3]:E1204 12:58:48.925000 381395 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:31.9953574Z [rank3]:E1204 12:58:48.925000 381395 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9953730Z [rank3]:E1204 12:58:48.925000 381395 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:31.9954024Z [rank3]:E1204 12:58:48.925000 381395 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9954153Z [rank3]:E1204 12:58:48.925000 381395 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:31.9954438Z [rank3]:E1204 12:58:48.925000 381395 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9954595Z [rank3]:E1204 12:58:48.925000 381395 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9954873Z [rank3]:E1204 12:58:48.925000 381395 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9955028Z [rank3]:E1204 12:58:48.925000 381395 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9955306Z [rank3]:E1204 12:58:48.925000 381395 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9955451Z [rank3]:E1204 12:58:48.925000 381395 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:31.9955747Z [rank3]:E1204 12:58:48.925000 381395 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9955904Z [rank3]:E1204 12:58:48.925000 381395 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:31.9956403Z [rank3]:E1204 12:58:48.925000 381395 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 2250244096 and is now 3026190336.
2025-12-04T13:38:31.9956531Z [rank3]:E1204 12:58:48.925000 381395 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9956737Z [rank3]:E1204 12:58:48.925000 381395 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9957112Z [rank3]:E1204 12:58:48.925000 381395 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda
2025-12-04T13:38:31.9957232Z [rank3]:E1204 12:58:48.925000 381395 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9957464Z [rank3]:E1204 12:58:48.925000 381395 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9957632Z [rank3]:E1204 12:58:48.925000 381395 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:31.9957693Z dist init r=3, world=4
2025-12-04T13:38:31.9958033Z [rank0]:[W1204 12:58:49.006656144 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:31.9958085Z FAILED [7.6177s] [100%]
2025-12-04T13:38:31.9958088Z 
2025-12-04T13:38:31.9958148Z =================================== FAILURES ===================================
2025-12-04T13:38:31.9958272Z _ TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda _
2025-12-04T13:38:31.9958323Z Traceback (most recent call last):
2025-12-04T13:38:31.9958497Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:31.9958545Z     self._join_processes(fn)
2025-12-04T13:38:31.9958730Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:31.9958791Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:31.9958981Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:31.9959028Z     raise RuntimeError(error)
2025-12-04T13:38:31.9959119Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:31.9959169Z Traceback (most recent call last):
2025-12-04T13:38:31.9959342Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9959390Z     getattr(self, test_name)()
2025-12-04T13:38:31.9959560Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9959650Z     fn()
2025-12-04T13:38:31.9959809Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9959880Z     method(*args, **kwargs)
2025-12-04T13:38:31.9960035Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9960087Z     method(*args, **kwargs)
2025-12-04T13:38:31.9960242Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9960291Z     with policy():
2025-12-04T13:38:31.9960450Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9960505Z     raise RuntimeError(msg)
2025-12-04T13:38:31.9960892Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 2453667840 and is now 3229614080.
2025-12-04T13:38:31.9960896Z 
2025-12-04T13:38:31.9960982Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9961233Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda
2025-12-04T13:38:31.9961235Z 
2025-12-04T13:38:31.9961334Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9961349Z 
2025-12-04T13:38:31.9961350Z 
2025-12-04T13:38:31.9961438Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:31.9961529Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:31.9961775Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-e061f9bfbb2801c5.xml -
2025-12-04T13:38:31.9961862Z =========================== short test summary info ============================
2025-12-04T13:38:31.9962133Z FAILED [7.6177s] distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_False_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:31.9962183Z Traceback (most recent call last):
2025-12-04T13:38:31.9962357Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9962405Z     getattr(self, test_name)()
2025-12-04T13:38:31.9962574Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9962614Z     fn()
2025-12-04T13:38:31.9962776Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9962822Z     method(*args, **kwargs)
2025-12-04T13:38:31.9962987Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9963032Z     method(*args, **kwargs)
2025-12-04T13:38:31.9963194Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9963236Z     with policy():
2025-12-04T13:38:31.9963397Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9963450Z     raise RuntimeError(msg)
2025-12-04T13:38:31.9963819Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 2453667840 and is now 3229614080.
2025-12-04T13:38:31.9963823Z 
2025-12-04T13:38:31.9963912Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9964174Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda
2025-12-04T13:38:31.9964177Z 
2025-12-04T13:38:31.9964275Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9964342Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:31.9964415Z ======================= 1 failed, 32 deselected in 7.76s =======================
2025-12-04T13:38:31.9964460Z Got exit code 1
2025-12-04T13:38:31.9964512Z Retrying single test...
2025-12-04T13:38:31.9964705Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-4998051b6cba4043.xml
2025-12-04T13:38:31.9964783Z ============================= test session starts ==============================
2025-12-04T13:38:31.9964902Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:31.9964954Z cachedir: .pytest_cache
2025-12-04T13:38:31.9965116Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:31.9965173Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:31.9965217Z configfile: pytest.ini
2025-12-04T13:38:31.9965392Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:31.9965488Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:31.9965733Z stepcurrent: skipping 3 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_False_cuda
2025-12-04T13:38:31.9965794Z Running 1 items in this shard
2025-12-04T13:38:31.9965808Z 
2025-12-04T13:38:31.9966134Z distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_False_cuda I1204 12:58:53.407000 381709 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 381778
2025-12-04T13:38:31.9966300Z I1204 12:58:53.408000 381709 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 381779
2025-12-04T13:38:31.9966456Z I1204 12:58:53.408000 381709 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 381780
2025-12-04T13:38:31.9966617Z I1204 12:58:53.409000 381709 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 381781
2025-12-04T13:38:31.9966980Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:31.9967039Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:31.9967401Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:31.9967451Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:31.9967962Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:31.9968028Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:31.9968540Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:31.9968610Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:31.9968966Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:31.9969024Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:31.9969523Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:31.9969641Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:31.9969997Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:31.9970069Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:31.9970569Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:31.9970646Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:31.9970801Z [rank2]:E1204 12:58:59.462000 381780 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:31.9970968Z [rank2]:E1204 12:58:59.462000 381780 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:31.9971271Z [rank2]:E1204 12:58:59.462000 381780 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9971431Z [rank2]:E1204 12:58:59.462000 381780 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:31.9971730Z [rank2]:E1204 12:58:59.462000 381780 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9971859Z [rank2]:E1204 12:58:59.462000 381780 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:31.9972147Z [rank2]:E1204 12:58:59.462000 381780 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9972307Z [rank2]:E1204 12:58:59.462000 381780 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9972586Z [rank2]:E1204 12:58:59.462000 381780 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9972760Z [rank2]:E1204 12:58:59.462000 381780 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9973041Z [rank2]:E1204 12:58:59.462000 381780 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9973187Z [rank2]:E1204 12:58:59.462000 381780 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:31.9973470Z [rank2]:E1204 12:58:59.462000 381780 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9973629Z [rank2]:E1204 12:58:59.462000 381780 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:31.9974147Z [rank2]:E1204 12:58:59.462000 381780 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 2300575744 and is now 3076521984.
2025-12-04T13:38:31.9974267Z [rank2]:E1204 12:58:59.462000 381780 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9974472Z [rank2]:E1204 12:58:59.462000 381780 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9974859Z [rank2]:E1204 12:58:59.462000 381780 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda
2025-12-04T13:38:31.9974997Z [rank2]:E1204 12:58:59.462000 381780 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9975219Z [rank2]:E1204 12:58:59.462000 381780 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9975387Z [rank2]:E1204 12:58:59.462000 381780 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:31.9975439Z dist init r=2, world=4
2025-12-04T13:38:31.9975580Z [rank0]:E1204 12:58:59.505000 381778 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:31.9975751Z [rank0]:E1204 12:58:59.505000 381778 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:31.9976042Z [rank0]:E1204 12:58:59.505000 381778 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9976207Z [rank0]:E1204 12:58:59.505000 381778 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:31.9976495Z [rank0]:E1204 12:58:59.505000 381778 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9976628Z [rank0]:E1204 12:58:59.505000 381778 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:31.9976909Z [rank0]:E1204 12:58:59.505000 381778 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9977068Z [rank0]:E1204 12:58:59.505000 381778 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9977368Z [rank0]:E1204 12:58:59.505000 381778 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9977518Z [rank0]:E1204 12:58:59.505000 381778 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9977806Z [rank0]:E1204 12:58:59.505000 381778 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9977946Z [rank0]:E1204 12:58:59.505000 381778 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:31.9978243Z [rank0]:E1204 12:58:59.505000 381778 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9978395Z [rank0]:E1204 12:58:59.505000 381778 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:31.9978893Z [rank0]:E1204 12:58:59.505000 381778 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 2453667840 and is now 3229614080.
2025-12-04T13:38:31.9979038Z [rank0]:E1204 12:58:59.505000 381778 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9979239Z [rank0]:E1204 12:58:59.505000 381778 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9979697Z [rank0]:E1204 12:58:59.505000 381778 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda
2025-12-04T13:38:31.9979814Z [rank0]:E1204 12:58:59.505000 381778 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9980034Z [rank0]:E1204 12:58:59.505000 381778 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9980209Z [rank0]:E1204 12:58:59.505000 381778 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:31.9980253Z dist init r=0, world=4
2025-12-04T13:38:31.9980400Z [rank1]:E1204 12:58:59.507000 381779 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:31.9980565Z [rank1]:E1204 12:58:59.507000 381779 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:31.9980859Z [rank1]:E1204 12:58:59.507000 381779 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9981011Z [rank1]:E1204 12:58:59.507000 381779 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:31.9981299Z [rank1]:E1204 12:58:59.507000 381779 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9981423Z [rank1]:E1204 12:58:59.507000 381779 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:31.9981716Z [rank1]:E1204 12:58:59.507000 381779 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9981864Z [rank1]:E1204 12:58:59.507000 381779 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9982137Z [rank1]:E1204 12:58:59.507000 381779 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9982286Z [rank1]:E1204 12:58:59.507000 381779 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9982573Z [rank1]:E1204 12:58:59.507000 381779 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9982714Z [rank1]:E1204 12:58:59.507000 381779 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:31.9982991Z [rank1]:E1204 12:58:59.507000 381779 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9983141Z [rank1]:E1204 12:58:59.507000 381779 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:31.9983648Z [rank1]:E1204 12:58:59.507000 381779 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 2317352960 and is now 3093299200.
2025-12-04T13:38:31.9983776Z [rank1]:E1204 12:58:59.507000 381779 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9983973Z [rank1]:E1204 12:58:59.507000 381779 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9984342Z [rank1]:E1204 12:58:59.507000 381779 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda
2025-12-04T13:38:31.9984459Z [rank1]:E1204 12:58:59.507000 381779 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9984670Z [rank1]:E1204 12:58:59.507000 381779 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9984839Z [rank1]:E1204 12:58:59.507000 381779 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:31.9984880Z dist init r=1, world=4
2025-12-04T13:38:31.9985015Z [rank3]:E1204 12:58:59.518000 381781 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:31.9985178Z [rank3]:E1204 12:58:59.518000 381781 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:31.9985463Z [rank3]:E1204 12:58:59.518000 381781 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9985619Z [rank3]:E1204 12:58:59.518000 381781 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:31.9985913Z [rank3]:E1204 12:58:59.518000 381781 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9986039Z [rank3]:E1204 12:58:59.518000 381781 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:31.9986313Z [rank3]:E1204 12:58:59.518000 381781 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9986462Z [rank3]:E1204 12:58:59.518000 381781 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9986747Z [rank3]:E1204 12:58:59.518000 381781 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9986894Z [rank3]:E1204 12:58:59.518000 381781 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:31.9987171Z [rank3]:E1204 12:58:59.518000 381781 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9987306Z [rank3]:E1204 12:58:59.518000 381781 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:31.9987586Z [rank3]:E1204 12:58:59.518000 381781 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9987746Z [rank3]:E1204 12:58:59.518000 381781 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:31.9988239Z [rank3]:E1204 12:58:59.518000 381781 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 2250244096 and is now 3026190336.
2025-12-04T13:38:31.9988364Z [rank3]:E1204 12:58:59.518000 381781 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9988558Z [rank3]:E1204 12:58:59.518000 381781 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9988930Z [rank3]:E1204 12:58:59.518000 381781 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda
2025-12-04T13:38:31.9989042Z [rank3]:E1204 12:58:59.518000 381781 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:31.9989253Z [rank3]:E1204 12:58:59.518000 381781 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9989416Z [rank3]:E1204 12:58:59.518000 381781 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:31.9989456Z dist init r=3, world=4
2025-12-04T13:38:31.9989828Z [rank0]:[W1204 12:58:59.754491296 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:31.9989868Z FAILED [7.7187s] [100%]
2025-12-04T13:38:31.9989870Z 
2025-12-04T13:38:31.9989930Z =================================== FAILURES ===================================
2025-12-04T13:38:31.9990058Z _ TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda _
2025-12-04T13:38:31.9990108Z Traceback (most recent call last):
2025-12-04T13:38:31.9990270Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:31.9990316Z     self._join_processes(fn)
2025-12-04T13:38:31.9990488Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:31.9990545Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:31.9990723Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:31.9990767Z     raise RuntimeError(error)
2025-12-04T13:38:31.9990859Z RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T13:38:31.9990909Z Traceback (most recent call last):
2025-12-04T13:38:31.9991070Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9991115Z     getattr(self, test_name)()
2025-12-04T13:38:31.9991272Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9991311Z     fn()
2025-12-04T13:38:31.9991464Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9991519Z     method(*args, **kwargs)
2025-12-04T13:38:31.9991669Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9991712Z     method(*args, **kwargs)
2025-12-04T13:38:31.9991866Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9991917Z     with policy():
2025-12-04T13:38:31.9992071Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9992112Z     raise RuntimeError(msg)
2025-12-04T13:38:31.9992478Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 2300575744 and is now 3076521984.
2025-12-04T13:38:31.9992482Z 
2025-12-04T13:38:31.9992557Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9992806Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda
2025-12-04T13:38:31.9992808Z 
2025-12-04T13:38:31.9992898Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9992900Z 
2025-12-04T13:38:31.9992903Z 
2025-12-04T13:38:31.9992980Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:31.9993066Z Process 2 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:31.9993301Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-4998051b6cba4043.xml -
2025-12-04T13:38:31.9993363Z =========================== short test summary info ============================
2025-12-04T13:38:31.9993622Z FAILED [7.7187s] distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_False_cuda - RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T13:38:31.9993670Z Traceback (most recent call last):
2025-12-04T13:38:31.9993834Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:31.9993891Z     getattr(self, test_name)()
2025-12-04T13:38:31.9994052Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:31.9994088Z     fn()
2025-12-04T13:38:31.9994240Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9994283Z     method(*args, **kwargs)
2025-12-04T13:38:31.9994435Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:31.9994479Z     method(*args, **kwargs)
2025-12-04T13:38:31.9994631Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:31.9994670Z     with policy():
2025-12-04T13:38:31.9994839Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:31.9994884Z     raise RuntimeError(msg)
2025-12-04T13:38:31.9995246Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 2300575744 and is now 3076521984.
2025-12-04T13:38:31.9995251Z 
2025-12-04T13:38:31.9995324Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:31.9995583Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_False_cuda
2025-12-04T13:38:31.9995586Z 
2025-12-04T13:38:31.9995672Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:31.9995749Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:31.9995811Z ======================= 1 failed, 32 deselected in 7.86s =======================
2025-12-04T13:38:31.9995850Z Got exit code 1
2025-12-04T13:38:31.9996045Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_False_cuda
2025-12-04T13:38:31.9996175Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T13:38:31.9996362Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-d31fb4781e379879.xml
2025-12-04T13:38:31.9996422Z ============================= test session starts ==============================
2025-12-04T13:38:31.9996533Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:31.9996577Z cachedir: .pytest_cache
2025-12-04T13:38:31.9996737Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:31.9996788Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:31.9996828Z configfile: pytest.ini
2025-12-04T13:38:31.9996994Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:31.9997068Z collecting ... collected 60 items / 4 deselected / 56 selected
2025-12-04T13:38:31.9997120Z stepcurrent: skipping 4 already run items.
2025-12-04T13:38:31.9997166Z Running 29 items in this shard
2025-12-04T13:38:31.9997168Z 
2025-12-04T13:38:31.9997477Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_no_shard_cuda I1204 12:59:03.885000 382095 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 382164
2025-12-04T13:38:31.9997636Z I1204 12:59:03.886000 382095 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 382165
2025-12-04T13:38:31.9997799Z I1204 12:59:03.886000 382095 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 382166
2025-12-04T13:38:31.9997951Z I1204 12:59:03.887000 382095 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 382167
2025-12-04T13:38:31.9998245Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:31.9998298Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:31.9998900Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:31.9998940Z   _warn_cpu_init()
2025-12-04T13:38:31.9999230Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:31.9999279Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:31.9999887Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:31.9999952Z   _warn_cpu_init()
2025-12-04T13:38:32.0000240Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0000289Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.0000573Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0000623Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.0001201Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0001243Z   _warn_cpu_init()
2025-12-04T13:38:32.0001807Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0001846Z   _warn_cpu_init()
2025-12-04T13:38:32.0002138Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0002230Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.0002517Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0002593Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.0002879Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0002952Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.0003250Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0003323Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.0004593Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.0004743Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.0004973Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.0005017Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0006285Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.0006410Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.0006636Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.0006680Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0007951Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.0008084Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.0008313Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.0008354Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0009647Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.0009794Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.0010017Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.0010061Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0010282Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.0010326Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0010549Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.0010590Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0010810Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.0010852Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0011070Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.0011114Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0011407Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.0011464Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0011611Z [rank0]:E1204 12:59:34.948000 382164 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0011774Z [rank0]:E1204 12:59:34.948000 382164 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0012067Z [rank0]:E1204 12:59:34.948000 382164 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0012224Z [rank0]:E1204 12:59:34.948000 382164 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0012526Z [rank0]:E1204 12:59:34.948000 382164 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0012652Z [rank0]:E1204 12:59:34.948000 382164 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0012936Z [rank0]:E1204 12:59:34.948000 382164 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0013087Z [rank0]:E1204 12:59:34.948000 382164 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0013378Z [rank0]:E1204 12:59:34.948000 382164 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0013528Z [rank0]:E1204 12:59:34.948000 382164 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0013815Z [rank0]:E1204 12:59:34.948000 382164 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0013954Z [rank0]:E1204 12:59:34.948000 382164 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0014230Z [rank0]:E1204 12:59:34.948000 382164 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0014380Z [rank0]:E1204 12:59:34.948000 382164 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0014864Z [rank0]:E1204 12:59:34.948000 382164 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 0. CUDA driver allocated memory was 2453667840 and is now 4013948928.
2025-12-04T13:38:32.0014980Z [rank0]:E1204 12:59:34.948000 382164 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0015179Z [rank0]:E1204 12:59:34.948000 382164 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0015540Z [rank0]:E1204 12:59:34.948000 382164 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda
2025-12-04T13:38:32.0015657Z [rank0]:E1204 12:59:34.948000 382164 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0015885Z [rank0]:E1204 12:59:34.948000 382164 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0016051Z [rank0]:E1204 12:59:34.948000 382164 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.0016092Z dist init r=0, world=4
2025-12-04T13:38:32.0016228Z [rank3]:E1204 12:59:34.953000 382167 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0016389Z [rank3]:E1204 12:59:34.953000 382167 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0016684Z [rank3]:E1204 12:59:34.953000 382167 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0016841Z [rank3]:E1204 12:59:34.953000 382167 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0017125Z [rank3]:E1204 12:59:34.953000 382167 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0017250Z [rank3]:E1204 12:59:34.953000 382167 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0017526Z [rank3]:E1204 12:59:34.953000 382167 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0017686Z [rank3]:E1204 12:59:34.953000 382167 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0017974Z [rank3]:E1204 12:59:34.953000 382167 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0018119Z [rank3]:E1204 12:59:34.953000 382167 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0018397Z [rank3]:E1204 12:59:34.953000 382167 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0018533Z [rank3]:E1204 12:59:34.953000 382167 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0018815Z [rank3]:E1204 12:59:34.953000 382167 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0018962Z [rank3]:E1204 12:59:34.953000 382167 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0019443Z [rank3]:E1204 12:59:34.953000 382167 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 3. CUDA driver allocated memory was 2250244096 and is now 3810525184.
2025-12-04T13:38:32.0019558Z [rank3]:E1204 12:59:34.953000 382167 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0019787Z [rank3]:E1204 12:59:34.953000 382167 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0020161Z [rank3]:E1204 12:59:34.953000 382167 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda
2025-12-04T13:38:32.0020274Z [rank3]:E1204 12:59:34.953000 382167 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0020485Z [rank3]:E1204 12:59:34.953000 382167 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0020648Z [rank3]:E1204 12:59:34.953000 382167 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.0020689Z dist init r=3, world=4
2025-12-04T13:38:32.0020827Z [rank2]:E1204 12:59:34.991000 382166 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0021000Z [rank2]:E1204 12:59:34.991000 382166 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0021290Z [rank2]:E1204 12:59:34.991000 382166 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0021443Z [rank2]:E1204 12:59:34.991000 382166 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0021730Z [rank2]:E1204 12:59:34.991000 382166 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0021864Z [rank2]:E1204 12:59:34.991000 382166 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0022142Z [rank2]:E1204 12:59:34.991000 382166 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0022300Z [rank2]:E1204 12:59:34.991000 382166 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0022577Z [rank2]:E1204 12:59:34.991000 382166 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0022726Z [rank2]:E1204 12:59:34.991000 382166 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0023001Z [rank2]:E1204 12:59:34.991000 382166 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0023138Z [rank2]:E1204 12:59:34.991000 382166 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0023415Z [rank2]:E1204 12:59:34.991000 382166 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0023565Z [rank2]:E1204 12:59:34.991000 382166 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0024044Z [rank2]:E1204 12:59:34.991000 382166 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 2. CUDA driver allocated memory was 2300575744 and is now 3860856832.
2025-12-04T13:38:32.0024160Z [rank2]:E1204 12:59:34.991000 382166 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0024463Z [rank2]:E1204 12:59:34.991000 382166 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0024821Z [rank2]:E1204 12:59:34.991000 382166 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda
2025-12-04T13:38:32.0024935Z [rank2]:E1204 12:59:34.991000 382166 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0025146Z [rank2]:E1204 12:59:34.991000 382166 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0025323Z [rank2]:E1204 12:59:34.991000 382166 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.0025362Z dist init r=2, world=4
2025-12-04T13:38:32.0025500Z [rank1]:E1204 12:59:35.000000 382165 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0025660Z [rank1]:E1204 12:59:35.000000 382165 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0025946Z [rank1]:E1204 12:59:35.000000 382165 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0026110Z [rank1]:E1204 12:59:35.000000 382165 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0026393Z [rank1]:E1204 12:59:35.000000 382165 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0026539Z [rank1]:E1204 12:59:35.000000 382165 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0026815Z [rank1]:E1204 12:59:35.000000 382165 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0026962Z [rank1]:E1204 12:59:35.000000 382165 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0027238Z [rank1]:E1204 12:59:35.000000 382165 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0027386Z [rank1]:E1204 12:59:35.000000 382165 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0027662Z [rank1]:E1204 12:59:35.000000 382165 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0027797Z [rank1]:E1204 12:59:35.000000 382165 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0028074Z [rank1]:E1204 12:59:35.000000 382165 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0028221Z [rank1]:E1204 12:59:35.000000 382165 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0028707Z [rank1]:E1204 12:59:35.000000 382165 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 1. CUDA driver allocated memory was 2317352960 and is now 3877634048.
2025-12-04T13:38:32.0028823Z [rank1]:E1204 12:59:35.000000 382165 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0029018Z [rank1]:E1204 12:59:35.000000 382165 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0029377Z [rank1]:E1204 12:59:35.000000 382165 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda
2025-12-04T13:38:32.0029488Z [rank1]:E1204 12:59:35.000000 382165 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0029767Z [rank1]:E1204 12:59:35.000000 382165 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0029929Z [rank1]:E1204 12:59:35.000000 382165 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.0029970Z dist init r=1, world=4
2025-12-04T13:38:32.0030302Z [rank0]:[W1204 12:59:35.133500092 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.0030358Z FAILED [32.9528s] [  3%]
2025-12-04T13:38:32.0030360Z 
2025-12-04T13:38:32.0030419Z =================================== FAILURES ===================================
2025-12-04T13:38:32.0030518Z __ TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda ___
2025-12-04T13:38:32.0030582Z Traceback (most recent call last):
2025-12-04T13:38:32.0030747Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.0030793Z     self._join_processes(fn)
2025-12-04T13:38:32.0030964Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.0031019Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.0031199Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.0031247Z     raise RuntimeError(error)
2025-12-04T13:38:32.0031326Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.0031373Z Traceback (most recent call last):
2025-12-04T13:38:32.0031534Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0031578Z     getattr(self, test_name)()
2025-12-04T13:38:32.0031736Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0031772Z     fn()
2025-12-04T13:38:32.0031923Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0031968Z     method(*args, **kwargs)
2025-12-04T13:38:32.0032117Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0032161Z     method(*args, **kwargs)
2025-12-04T13:38:32.0032311Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0032351Z     with policy():
2025-12-04T13:38:32.0032503Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0032546Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0032915Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 0. CUDA driver allocated memory was 2453667840 and is now 4013948928.
2025-12-04T13:38:32.0032918Z 
2025-12-04T13:38:32.0032994Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0033227Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda
2025-12-04T13:38:32.0033230Z 
2025-12-04T13:38:32.0033317Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0033320Z 
2025-12-04T13:38:32.0033321Z 
2025-12-04T13:38:32.0033407Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.0033498Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.0033734Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-d31fb4781e379879.xml -
2025-12-04T13:38:32.0033794Z =========================== short test summary info ============================
2025-12-04T13:38:32.0034045Z FAILED [32.9528s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_no_shard_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.0034102Z Traceback (most recent call last):
2025-12-04T13:38:32.0034267Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0034310Z     getattr(self, test_name)()
2025-12-04T13:38:32.0034470Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0034518Z     fn()
2025-12-04T13:38:32.0034669Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0034711Z     method(*args, **kwargs)
2025-12-04T13:38:32.0034862Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0034903Z     method(*args, **kwargs)
2025-12-04T13:38:32.0035054Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0035093Z     with policy():
2025-12-04T13:38:32.0035244Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0035287Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0035640Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 0. CUDA driver allocated memory was 2453667840 and is now 4013948928.
2025-12-04T13:38:32.0035644Z 
2025-12-04T13:38:32.0035720Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0035953Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda
2025-12-04T13:38:32.0035956Z 
2025-12-04T13:38:32.0036041Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0036105Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.0036166Z ======================= 1 failed, 4 deselected in 33.11s =======================
2025-12-04T13:38:32.0036204Z Got exit code 1
2025-12-04T13:38:32.0036246Z Retrying single test...
2025-12-04T13:38:32.0036445Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-10252b0d5be41435.xml
2025-12-04T13:38:32.0036503Z ============================= test session starts ==============================
2025-12-04T13:38:32.0036617Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.0036658Z cachedir: .pytest_cache
2025-12-04T13:38:32.0036817Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.0036863Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.0036905Z configfile: pytest.ini
2025-12-04T13:38:32.0037068Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.0037143Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.0037375Z stepcurrent: skipping 4 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_no_shard_cuda
2025-12-04T13:38:32.0037420Z Running 1 items in this shard
2025-12-04T13:38:32.0037422Z 
2025-12-04T13:38:32.0037729Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_no_shard_cuda I1204 12:59:39.501000 382497 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 382566
2025-12-04T13:38:32.0037885Z I1204 12:59:39.501000 382497 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 382567
2025-12-04T13:38:32.0038049Z I1204 12:59:39.502000 382497 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 382568
2025-12-04T13:38:32.0038198Z I1204 12:59:39.502000 382497 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 382569
2025-12-04T13:38:32.0038503Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0038554Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.0038840Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0038888Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.0039473Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0039512Z   _warn_cpu_init()
2025-12-04T13:38:32.0039814Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0039866Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.0040436Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0040478Z   _warn_cpu_init()
2025-12-04T13:38:32.0041057Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0041097Z   _warn_cpu_init()
2025-12-04T13:38:32.0041392Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0041439Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.0042028Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0042066Z   _warn_cpu_init()
2025-12-04T13:38:32.0042353Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0042450Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.0042737Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0042827Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.0043114Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0043187Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.0043470Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0043544Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.0044819Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.0044945Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.0045187Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.0045231Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0046493Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.0046618Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.0047857Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.0047995Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.0048222Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.0048265Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0048489Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.0048533Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0049856Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.0049993Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.0050221Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.0050262Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0050488Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.0050529Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0050752Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.0050791Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0051024Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.0051067Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0051286Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.0051325Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0051619Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.0051672Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0051820Z [rank3]:E1204 13:00:10.576000 382569 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0051985Z [rank3]:E1204 13:00:10.576000 382569 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0052290Z [rank3]:E1204 13:00:10.576000 382569 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0052447Z [rank3]:E1204 13:00:10.576000 382569 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0052732Z [rank3]:E1204 13:00:10.576000 382569 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0052860Z [rank3]:E1204 13:00:10.576000 382569 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0053138Z [rank3]:E1204 13:00:10.576000 382569 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0053288Z [rank3]:E1204 13:00:10.576000 382569 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0053563Z [rank3]:E1204 13:00:10.576000 382569 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0053712Z [rank3]:E1204 13:00:10.576000 382569 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0053992Z [rank3]:E1204 13:00:10.576000 382569 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0054129Z [rank3]:E1204 13:00:10.576000 382569 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0054419Z [rank3]:E1204 13:00:10.576000 382569 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0054568Z [rank3]:E1204 13:00:10.576000 382569 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0055053Z [rank3]:E1204 13:00:10.576000 382569 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 3. CUDA driver allocated memory was 2250244096 and is now 3810525184.
2025-12-04T13:38:32.0055184Z [rank3]:E1204 13:00:10.576000 382569 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0055382Z [rank3]:E1204 13:00:10.576000 382569 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0055743Z [rank3]:E1204 13:00:10.576000 382569 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda
2025-12-04T13:38:32.0055856Z [rank3]:E1204 13:00:10.576000 382569 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0056081Z [rank3]:E1204 13:00:10.576000 382569 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0056247Z [rank3]:E1204 13:00:10.576000 382569 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.0056301Z dist init r=3, world=4
2025-12-04T13:38:32.0056439Z [rank2]:E1204 13:00:10.618000 382568 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0056602Z [rank2]:E1204 13:00:10.618000 382568 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0056892Z [rank2]:E1204 13:00:10.618000 382568 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0057045Z [rank2]:E1204 13:00:10.618000 382568 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0057335Z [rank2]:E1204 13:00:10.618000 382568 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0057460Z [rank2]:E1204 13:00:10.618000 382568 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0057741Z [rank2]:E1204 13:00:10.618000 382568 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0057888Z [rank2]:E1204 13:00:10.618000 382568 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0058169Z [rank2]:E1204 13:00:10.618000 382568 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0058316Z [rank2]:E1204 13:00:10.618000 382568 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0058605Z [rank2]:E1204 13:00:10.618000 382568 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0058743Z [rank2]:E1204 13:00:10.618000 382568 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0059018Z [rank2]:E1204 13:00:10.618000 382568 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0059169Z [rank2]:E1204 13:00:10.618000 382568 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0059709Z [rank2]:E1204 13:00:10.618000 382568 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 2. CUDA driver allocated memory was 2300575744 and is now 3860856832.
2025-12-04T13:38:32.0059826Z [rank2]:E1204 13:00:10.618000 382568 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0060022Z [rank2]:E1204 13:00:10.618000 382568 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0060377Z [rank2]:E1204 13:00:10.618000 382568 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda
2025-12-04T13:38:32.0060503Z [rank2]:E1204 13:00:10.618000 382568 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0060728Z [rank2]:E1204 13:00:10.618000 382568 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0060895Z [rank2]:E1204 13:00:10.618000 382568 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.0060934Z dist init r=2, world=4
2025-12-04T13:38:32.0061072Z [rank1]:E1204 13:00:10.626000 382567 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0061232Z [rank1]:E1204 13:00:10.626000 382567 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0061522Z [rank1]:E1204 13:00:10.626000 382567 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0061677Z [rank1]:E1204 13:00:10.626000 382567 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0061966Z [rank1]:E1204 13:00:10.626000 382567 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0062091Z [rank1]:E1204 13:00:10.626000 382567 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0062367Z [rank1]:E1204 13:00:10.626000 382567 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0062517Z [rank1]:E1204 13:00:10.626000 382567 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0062806Z [rank1]:E1204 13:00:10.626000 382567 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0062957Z [rank1]:E1204 13:00:10.626000 382567 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0063232Z [rank1]:E1204 13:00:10.626000 382567 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0063366Z [rank1]:E1204 13:00:10.626000 382567 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0063645Z [rank1]:E1204 13:00:10.626000 382567 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0063803Z [rank1]:E1204 13:00:10.626000 382567 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0064284Z [rank1]:E1204 13:00:10.626000 382567 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 1. CUDA driver allocated memory was 2317352960 and is now 3877634048.
2025-12-04T13:38:32.0064398Z [rank1]:E1204 13:00:10.626000 382567 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0064602Z [rank1]:E1204 13:00:10.626000 382567 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0064961Z [rank1]:E1204 13:00:10.626000 382567 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda
2025-12-04T13:38:32.0065082Z [rank1]:E1204 13:00:10.626000 382567 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0065294Z [rank1]:E1204 13:00:10.626000 382567 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0065457Z [rank1]:E1204 13:00:10.626000 382567 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.0065499Z dist init r=1, world=4
2025-12-04T13:38:32.0065636Z [rank0]:E1204 13:00:10.628000 382566 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0065798Z [rank0]:E1204 13:00:10.628000 382566 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0066084Z [rank0]:E1204 13:00:10.628000 382566 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0066239Z [rank0]:E1204 13:00:10.628000 382566 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0066527Z [rank0]:E1204 13:00:10.628000 382566 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0066650Z [rank0]:E1204 13:00:10.628000 382566 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0066928Z [rank0]:E1204 13:00:10.628000 382566 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0067085Z [rank0]:E1204 13:00:10.628000 382566 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0067361Z [rank0]:E1204 13:00:10.628000 382566 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0067506Z [rank0]:E1204 13:00:10.628000 382566 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0067783Z [rank0]:E1204 13:00:10.628000 382566 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0067932Z [rank0]:E1204 13:00:10.628000 382566 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0068211Z [rank0]:E1204 13:00:10.628000 382566 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0068359Z [rank0]:E1204 13:00:10.628000 382566 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0068834Z [rank0]:E1204 13:00:10.628000 382566 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 0. CUDA driver allocated memory was 2453667840 and is now 4013948928.
2025-12-04T13:38:32.0068960Z [rank0]:E1204 13:00:10.628000 382566 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0069165Z [rank0]:E1204 13:00:10.628000 382566 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0069522Z [rank0]:E1204 13:00:10.628000 382566 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda
2025-12-04T13:38:32.0069663Z [rank0]:E1204 13:00:10.628000 382566 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0069874Z [rank0]:E1204 13:00:10.628000 382566 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0070038Z [rank0]:E1204 13:00:10.628000 382566 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.0070077Z dist init r=0, world=4
2025-12-04T13:38:32.0070415Z [rank0]:[W1204 13:00:10.889231270 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.0070456Z FAILED [32.9531s] [100%]
2025-12-04T13:38:32.0070458Z 
2025-12-04T13:38:32.0070517Z =================================== FAILURES ===================================
2025-12-04T13:38:32.0070617Z __ TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda ___
2025-12-04T13:38:32.0070666Z Traceback (most recent call last):
2025-12-04T13:38:32.0070830Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.0070876Z     self._join_processes(fn)
2025-12-04T13:38:32.0071050Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.0071105Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.0071296Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.0071340Z     raise RuntimeError(error)
2025-12-04T13:38:32.0071422Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.0071467Z Traceback (most recent call last):
2025-12-04T13:38:32.0071629Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0071672Z     getattr(self, test_name)()
2025-12-04T13:38:32.0071832Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0071867Z     fn()
2025-12-04T13:38:32.0072032Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0072075Z     method(*args, **kwargs)
2025-12-04T13:38:32.0072229Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0072270Z     method(*args, **kwargs)
2025-12-04T13:38:32.0072423Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0072459Z     with policy():
2025-12-04T13:38:32.0072614Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0072667Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0073023Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 3. CUDA driver allocated memory was 2250244096 and is now 3810525184.
2025-12-04T13:38:32.0073037Z 
2025-12-04T13:38:32.0073116Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0073348Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda
2025-12-04T13:38:32.0073351Z 
2025-12-04T13:38:32.0073440Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0073442Z 
2025-12-04T13:38:32.0073443Z 
2025-12-04T13:38:32.0073518Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.0073610Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.0073842Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-10252b0d5be41435.xml -
2025-12-04T13:38:32.0073907Z =========================== short test summary info ============================
2025-12-04T13:38:32.0074156Z FAILED [32.9531s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_no_shard_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.0074203Z Traceback (most recent call last):
2025-12-04T13:38:32.0074366Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0074410Z     getattr(self, test_name)()
2025-12-04T13:38:32.0074569Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0074606Z     fn()
2025-12-04T13:38:32.0074761Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0074801Z     method(*args, **kwargs)
2025-12-04T13:38:32.0074956Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0075014Z     method(*args, **kwargs)
2025-12-04T13:38:32.0075167Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0075203Z     with policy():
2025-12-04T13:38:32.0075356Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0075398Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0075752Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 3. CUDA driver allocated memory was 2250244096 and is now 3810525184.
2025-12-04T13:38:32.0075755Z 
2025-12-04T13:38:32.0075828Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0076073Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda
2025-12-04T13:38:32.0076075Z 
2025-12-04T13:38:32.0076160Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0076225Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.0076287Z ====================== 1 failed, 32 deselected in 33.11s =======================
2025-12-04T13:38:32.0076326Z Got exit code 1
2025-12-04T13:38:32.0076376Z Retrying single test...
2025-12-04T13:38:32.0076567Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9ad8d28cd3d2f972.xml
2025-12-04T13:38:32.0076627Z ============================= test session starts ==============================
2025-12-04T13:38:32.0076740Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.0076793Z cachedir: .pytest_cache
2025-12-04T13:38:32.0076951Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.0076999Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.0077039Z configfile: pytest.ini
2025-12-04T13:38:32.0077204Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.0077277Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.0077504Z stepcurrent: skipping 4 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_no_shard_cuda
2025-12-04T13:38:32.0077546Z Running 1 items in this shard
2025-12-04T13:38:32.0077548Z 
2025-12-04T13:38:32.0077859Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_no_shard_cuda I1204 13:00:15.020000 382899 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 382968
2025-12-04T13:38:32.0078015Z I1204 13:00:15.020000 382899 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 382969
2025-12-04T13:38:32.0078168Z I1204 13:00:15.021000 382899 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 382970
2025-12-04T13:38:32.0078320Z I1204 13:00:15.021000 382899 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 382971
2025-12-04T13:38:32.0078613Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0078666Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.0078962Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0079013Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.0079627Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0079669Z   _warn_cpu_init()
2025-12-04T13:38:32.0079968Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0080019Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.0080593Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0080647Z   _warn_cpu_init()
2025-12-04T13:38:32.0081222Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0081275Z   _warn_cpu_init()
2025-12-04T13:38:32.0081563Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0081613Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.0082182Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0082222Z   _warn_cpu_init()
2025-12-04T13:38:32.0082509Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0082589Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.0082873Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0082952Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.0083238Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0083314Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.0083617Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0083689Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.0084963Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.0085092Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.0086346Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.0086492Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.0087781Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.0087903Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.0088133Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.0088181Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0089460Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.0089620Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.0089849Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.0089891Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0090117Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.0090173Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0090398Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.0090439Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0090868Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.0090910Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0091129Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.0091171Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0091389Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.0091432Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0091654Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.0091697Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0091990Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.0092033Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0092177Z [rank0]:E1204 13:00:46.055000 382968 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0092343Z [rank0]:E1204 13:00:46.055000 382968 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0092635Z [rank0]:E1204 13:00:46.055000 382968 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0092792Z [rank0]:E1204 13:00:46.055000 382968 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0093096Z [rank0]:E1204 13:00:46.055000 382968 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0093222Z [rank0]:E1204 13:00:46.055000 382968 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0093506Z [rank0]:E1204 13:00:46.055000 382968 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0093656Z [rank0]:E1204 13:00:46.055000 382968 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0093946Z [rank0]:E1204 13:00:46.055000 382968 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0094094Z [rank0]:E1204 13:00:46.055000 382968 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0094375Z [rank0]:E1204 13:00:46.055000 382968 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0094514Z [rank0]:E1204 13:00:46.055000 382968 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0094804Z [rank0]:E1204 13:00:46.055000 382968 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0094954Z [rank0]:E1204 13:00:46.055000 382968 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0095447Z [rank0]:E1204 13:00:46.055000 382968 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 0. CUDA driver allocated memory was 2453667840 and is now 4013948928.
2025-12-04T13:38:32.0095566Z [rank0]:E1204 13:00:46.055000 382968 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0095762Z [rank0]:E1204 13:00:46.055000 382968 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0096127Z [rank0]:E1204 13:00:46.055000 382968 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda
2025-12-04T13:38:32.0096244Z [rank0]:E1204 13:00:46.055000 382968 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0096456Z [rank0]:E1204 13:00:46.055000 382968 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0096623Z [rank0]:E1204 13:00:46.055000 382968 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.0096663Z dist init r=0, world=4
2025-12-04T13:38:32.0096803Z [rank2]:E1204 13:00:46.064000 382970 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0096963Z [rank2]:E1204 13:00:46.064000 382970 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0097263Z [rank2]:E1204 13:00:46.064000 382970 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0097417Z [rank2]:E1204 13:00:46.064000 382970 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0097706Z [rank2]:E1204 13:00:46.064000 382970 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0097833Z [rank2]:E1204 13:00:46.064000 382970 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0098120Z [rank2]:E1204 13:00:46.064000 382970 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0098271Z [rank2]:E1204 13:00:46.064000 382970 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0098545Z [rank2]:E1204 13:00:46.064000 382970 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0098693Z [rank2]:E1204 13:00:46.064000 382970 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0098977Z [rank2]:E1204 13:00:46.064000 382970 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0099115Z [rank2]:E1204 13:00:46.064000 382970 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0099404Z [rank2]:E1204 13:00:46.064000 382970 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0099550Z [rank2]:E1204 13:00:46.064000 382970 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0100062Z [rank2]:E1204 13:00:46.064000 382970 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 2. CUDA driver allocated memory was 2300575744 and is now 3860856832.
2025-12-04T13:38:32.0100180Z [rank2]:E1204 13:00:46.064000 382970 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0100381Z [rank2]:E1204 13:00:46.064000 382970 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0100739Z [rank2]:E1204 13:00:46.064000 382970 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda
2025-12-04T13:38:32.0100854Z [rank2]:E1204 13:00:46.064000 382970 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0101069Z [rank2]:E1204 13:00:46.064000 382970 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0101232Z [rank2]:E1204 13:00:46.064000 382970 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.0101273Z dist init r=2, world=4
2025-12-04T13:38:32.0101424Z [rank3]:E1204 13:00:46.066000 382971 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0101585Z [rank3]:E1204 13:00:46.066000 382971 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0101869Z [rank3]:E1204 13:00:46.066000 382971 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0102027Z [rank3]:E1204 13:00:46.066000 382971 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0102312Z [rank3]:E1204 13:00:46.066000 382971 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0102450Z [rank3]:E1204 13:00:46.066000 382971 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0102728Z [rank3]:E1204 13:00:46.066000 382971 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0102875Z [rank3]:E1204 13:00:46.066000 382971 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0103153Z [rank3]:E1204 13:00:46.066000 382971 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0103311Z [rank3]:E1204 13:00:46.066000 382971 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0103588Z [rank3]:E1204 13:00:46.066000 382971 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0103737Z [rank3]:E1204 13:00:46.066000 382971 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0104016Z [rank3]:E1204 13:00:46.066000 382971 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0104168Z [rank3]:E1204 13:00:46.066000 382971 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0104648Z [rank3]:E1204 13:00:46.066000 382971 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 3. CUDA driver allocated memory was 2250244096 and is now 3810525184.
2025-12-04T13:38:32.0104764Z [rank3]:E1204 13:00:46.066000 382971 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0104957Z [rank3]:E1204 13:00:46.066000 382971 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0105314Z [rank3]:E1204 13:00:46.066000 382971 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda
2025-12-04T13:38:32.0105428Z [rank3]:E1204 13:00:46.066000 382971 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0105642Z [rank3]:E1204 13:00:46.066000 382971 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0105817Z [rank3]:E1204 13:00:46.066000 382971 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.0105857Z dist init r=3, world=4
2025-12-04T13:38:32.0105997Z [rank1]:E1204 13:00:46.110000 382969 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0106154Z [rank1]:E1204 13:00:46.110000 382969 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0106444Z [rank1]:E1204 13:00:46.110000 382969 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0106605Z [rank1]:E1204 13:00:46.110000 382969 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0106893Z [rank1]:E1204 13:00:46.110000 382969 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0107015Z [rank1]:E1204 13:00:46.110000 382969 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0107293Z [rank1]:E1204 13:00:46.110000 382969 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0107458Z [rank1]:E1204 13:00:46.110000 382969 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0107734Z [rank1]:E1204 13:00:46.110000 382969 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0107898Z [rank1]:E1204 13:00:46.110000 382969 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0108176Z [rank1]:E1204 13:00:46.110000 382969 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0123456Z [rank1]:E1204 13:00:46.110000 382969 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0123771Z [rank1]:E1204 13:00:46.110000 382969 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0123932Z [rank1]:E1204 13:00:46.110000 382969 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0124419Z [rank1]:E1204 13:00:46.110000 382969 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 1. CUDA driver allocated memory was 2317352960 and is now 3877634048.
2025-12-04T13:38:32.0124541Z [rank1]:E1204 13:00:46.110000 382969 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0124744Z [rank1]:E1204 13:00:46.110000 382969 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0125114Z [rank1]:E1204 13:00:46.110000 382969 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda
2025-12-04T13:38:32.0125286Z [rank1]:E1204 13:00:46.110000 382969 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0125504Z [rank1]:E1204 13:00:46.110000 382969 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0125673Z [rank1]:E1204 13:00:46.110000 382969 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.0125717Z dist init r=1, world=4
2025-12-04T13:38:32.0126059Z [rank0]:[W1204 13:00:46.246017877 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.0126102Z FAILED [32.8533s] [100%]
2025-12-04T13:38:32.0126124Z 
2025-12-04T13:38:32.0126188Z =================================== FAILURES ===================================
2025-12-04T13:38:32.0126291Z __ TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda ___
2025-12-04T13:38:32.0126344Z Traceback (most recent call last):
2025-12-04T13:38:32.0126514Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.0126562Z     self._join_processes(fn)
2025-12-04T13:38:32.0126739Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.0126811Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.0126996Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.0127041Z     raise RuntimeError(error)
2025-12-04T13:38:32.0127127Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.0127192Z Traceback (most recent call last):
2025-12-04T13:38:32.0127362Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0127406Z     getattr(self, test_name)()
2025-12-04T13:38:32.0127570Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0127606Z     fn()
2025-12-04T13:38:32.0127763Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0127806Z     method(*args, **kwargs)
2025-12-04T13:38:32.0127962Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0128003Z     method(*args, **kwargs)
2025-12-04T13:38:32.0128161Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0128201Z     with policy():
2025-12-04T13:38:32.0128361Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0128402Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0128762Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 0. CUDA driver allocated memory was 2453667840 and is now 4013948928.
2025-12-04T13:38:32.0128766Z 
2025-12-04T13:38:32.0128843Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0129081Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda
2025-12-04T13:38:32.0129085Z 
2025-12-04T13:38:32.0129178Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0129192Z 
2025-12-04T13:38:32.0129194Z 
2025-12-04T13:38:32.0129273Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.0129366Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.0129659Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9ad8d28cd3d2f972.xml -
2025-12-04T13:38:32.0129723Z =========================== short test summary info ============================
2025-12-04T13:38:32.0129979Z FAILED [32.8533s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_no_shard_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.0130028Z Traceback (most recent call last):
2025-12-04T13:38:32.0130214Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0130266Z     getattr(self, test_name)()
2025-12-04T13:38:32.0130429Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0130467Z     fn()
2025-12-04T13:38:32.0130622Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0130665Z     method(*args, **kwargs)
2025-12-04T13:38:32.0130824Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0130877Z     method(*args, **kwargs)
2025-12-04T13:38:32.0131033Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0131071Z     with policy():
2025-12-04T13:38:32.0131245Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0131288Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0131644Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 0. CUDA driver allocated memory was 2453667840 and is now 4013948928.
2025-12-04T13:38:32.0131647Z 
2025-12-04T13:38:32.0131722Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0131959Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_no_shard_cuda
2025-12-04T13:38:32.0131961Z 
2025-12-04T13:38:32.0132049Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0132118Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.0132183Z ====================== 1 failed, 32 deselected in 33.00s =======================
2025-12-04T13:38:32.0132225Z Got exit code 1
2025-12-04T13:38:32.0132412Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_no_shard_cuda
2025-12-04T13:38:32.0132542Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T13:38:32.0132737Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-4b715ed61ff2d6c7.xml
2025-12-04T13:38:32.0132798Z ============================= test session starts ==============================
2025-12-04T13:38:32.0132916Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.0132958Z cachedir: .pytest_cache
2025-12-04T13:38:32.0133123Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.0133184Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.0133229Z configfile: pytest.ini
2025-12-04T13:38:32.0133394Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.0133472Z collecting ... collected 60 items / 5 deselected / 55 selected
2025-12-04T13:38:32.0133527Z stepcurrent: skipping 5 already run items.
2025-12-04T13:38:32.0133574Z Running 28 items in this shard
2025-12-04T13:38:32.0133577Z 
2025-12-04T13:38:32.0133887Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_no_shard_cuda I1204 13:00:50.334000 383301 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 383370
2025-12-04T13:38:32.0134057Z I1204 13:00:50.335000 383301 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 383371
2025-12-04T13:38:32.0134213Z I1204 13:00:50.335000 383301 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 383372
2025-12-04T13:38:32.0134367Z I1204 13:00:50.336000 383301 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 383373
2025-12-04T13:38:32.0134670Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0134734Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.0135330Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0135379Z   _warn_cpu_init()
2025-12-04T13:38:32.0135672Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0135723Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.0136299Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0136341Z   _warn_cpu_init()
2025-12-04T13:38:32.0136630Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0136684Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.0137259Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0137300Z   _warn_cpu_init()
2025-12-04T13:38:32.0137599Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0137652Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.0138225Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0138265Z   _warn_cpu_init()
2025-12-04T13:38:32.0138565Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0138648Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.0138938Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0139014Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.0139305Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0139392Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.0139723Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0139815Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.0140108Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.0140155Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0140385Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.0140433Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0140655Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.0140701Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0140925Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.0140971Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0141195Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.0141236Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0141460Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.0141502Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0141724Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.0141766Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0142001Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.0142043Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0142264Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.0142305Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0142454Z [rank0]:E1204 13:01:27.673000 383370 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0142622Z [rank0]:E1204 13:01:27.673000 383370 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0142931Z [rank0]:E1204 13:01:27.673000 383370 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0143093Z [rank0]:E1204 13:01:27.673000 383370 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0143378Z [rank0]:E1204 13:01:27.673000 383370 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0143508Z [rank0]:E1204 13:01:27.673000 383370 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0143800Z [rank0]:E1204 13:01:27.673000 383370 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0143953Z [rank0]:E1204 13:01:27.673000 383370 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0144239Z [rank0]:E1204 13:01:27.673000 383370 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0144390Z [rank0]:E1204 13:01:27.673000 383370 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0144665Z [rank0]:E1204 13:01:27.673000 383370 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0144807Z [rank0]:E1204 13:01:27.673000 383370 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0145092Z [rank0]:E1204 13:01:27.673000 383370 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0145241Z [rank0]:E1204 13:01:27.673000 383370 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0145721Z [rank0]:E1204 13:01:27.673000 383370 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 0. CUDA driver allocated memory was 2453667840 and is now 3990880256.
2025-12-04T13:38:32.0145839Z [rank0]:E1204 13:01:27.673000 383370 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0146038Z [rank0]:E1204 13:01:27.673000 383370 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0146418Z [rank0]:E1204 13:01:27.673000 383370 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda
2025-12-04T13:38:32.0146532Z [rank0]:E1204 13:01:27.673000 383370 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0146748Z [rank0]:E1204 13:01:27.673000 383370 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0146913Z [rank0]:E1204 13:01:27.673000 383370 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.0146956Z dist init r=0, world=4
2025-12-04T13:38:32.0147105Z [rank1]:E1204 13:01:27.675000 383371 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0147270Z [rank1]:E1204 13:01:27.675000 383371 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0147558Z [rank1]:E1204 13:01:27.675000 383371 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0147716Z [rank1]:E1204 13:01:27.675000 383371 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0148014Z [rank1]:E1204 13:01:27.675000 383371 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0148138Z [rank1]:E1204 13:01:27.675000 383371 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0148431Z [rank1]:E1204 13:01:27.675000 383371 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0148578Z [rank1]:E1204 13:01:27.675000 383371 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0148855Z [rank1]:E1204 13:01:27.675000 383371 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0149002Z [rank1]:E1204 13:01:27.675000 383371 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0149282Z [rank1]:E1204 13:01:27.675000 383371 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0149422Z [rank1]:E1204 13:01:27.675000 383371 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0149749Z [rank1]:E1204 13:01:27.675000 383371 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0149900Z [rank1]:E1204 13:01:27.675000 383371 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0150378Z [rank1]:E1204 13:01:27.675000 383371 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 1. CUDA driver allocated memory was 2317352960 and is now 3854565376.
2025-12-04T13:38:32.0150508Z [rank1]:E1204 13:01:27.675000 383371 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0150705Z [rank1]:E1204 13:01:27.675000 383371 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0151066Z [rank1]:E1204 13:01:27.675000 383371 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda
2025-12-04T13:38:32.0151183Z [rank1]:E1204 13:01:27.675000 383371 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0151397Z [rank1]:E1204 13:01:27.675000 383371 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0151584Z [rank1]:E1204 13:01:27.675000 383371 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.0151624Z dist init r=1, world=4
2025-12-04T13:38:32.0151767Z [rank2]:E1204 13:01:27.681000 383372 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0151925Z [rank2]:E1204 13:01:27.681000 383372 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0152216Z [rank2]:E1204 13:01:27.681000 383372 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0152382Z [rank2]:E1204 13:01:27.681000 383372 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0152670Z [rank2]:E1204 13:01:27.681000 383372 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0152812Z [rank2]:E1204 13:01:27.681000 383372 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0153088Z [rank2]:E1204 13:01:27.681000 383372 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0153238Z [rank2]:E1204 13:01:27.681000 383372 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0153518Z [rank2]:E1204 13:01:27.681000 383372 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0153673Z [rank2]:E1204 13:01:27.681000 383372 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0153951Z [rank2]:E1204 13:01:27.681000 383372 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0154090Z [rank2]:E1204 13:01:27.681000 383372 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0154369Z [rank2]:E1204 13:01:27.681000 383372 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0154517Z [rank2]:E1204 13:01:27.681000 383372 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0155003Z [rank2]:E1204 13:01:27.681000 383372 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 2. CUDA driver allocated memory was 2300575744 and is now 3837788160.
2025-12-04T13:38:32.0155119Z [rank2]:E1204 13:01:27.681000 383372 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0155317Z [rank2]:E1204 13:01:27.681000 383372 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0155677Z [rank2]:E1204 13:01:27.681000 383372 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda
2025-12-04T13:38:32.0155801Z [rank2]:E1204 13:01:27.681000 383372 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0156018Z [rank2]:E1204 13:01:27.681000 383372 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0156182Z [rank2]:E1204 13:01:27.681000 383372 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.0156224Z dist init r=2, world=4
2025-12-04T13:38:32.0156361Z [rank3]:E1204 13:01:27.745000 383373 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0156535Z [rank3]:E1204 13:01:27.745000 383373 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0156821Z [rank3]:E1204 13:01:27.745000 383373 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0156988Z [rank3]:E1204 13:01:27.745000 383373 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0157271Z [rank3]:E1204 13:01:27.745000 383373 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0157397Z [rank3]:E1204 13:01:27.745000 383373 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0157677Z [rank3]:E1204 13:01:27.745000 383373 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0157827Z [rank3]:E1204 13:01:27.745000 383373 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0158108Z [rank3]:E1204 13:01:27.745000 383373 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0158255Z [rank3]:E1204 13:01:27.745000 383373 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0158531Z [rank3]:E1204 13:01:27.745000 383373 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0158668Z [rank3]:E1204 13:01:27.745000 383373 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0158949Z [rank3]:E1204 13:01:27.745000 383373 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0159110Z [rank3]:E1204 13:01:27.745000 383373 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0159624Z [rank3]:E1204 13:01:27.745000 383373 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 3. CUDA driver allocated memory was 2250244096 and is now 3787456512.
2025-12-04T13:38:32.0159745Z [rank3]:E1204 13:01:27.745000 383373 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0159939Z [rank3]:E1204 13:01:27.745000 383373 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0160317Z [rank3]:E1204 13:01:27.745000 383373 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda
2025-12-04T13:38:32.0160431Z [rank3]:E1204 13:01:27.745000 383373 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0160645Z [rank3]:E1204 13:01:27.745000 383373 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0160824Z [rank3]:E1204 13:01:27.745000 383373 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.0160864Z dist init r=3, world=4
2025-12-04T13:38:32.0161203Z [rank0]:[W1204 13:01:27.854831028 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.0161263Z FAILED [39.1581s] [  3%]
2025-12-04T13:38:32.0161266Z 
2025-12-04T13:38:32.0161327Z =================================== FAILURES ===================================
2025-12-04T13:38:32.0161429Z ___ TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda ___
2025-12-04T13:38:32.0161480Z Traceback (most recent call last):
2025-12-04T13:38:32.0161644Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.0161692Z     self._join_processes(fn)
2025-12-04T13:38:32.0161867Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.0161924Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.0162104Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.0162154Z     raise RuntimeError(error)
2025-12-04T13:38:32.0162234Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.0162282Z Traceback (most recent call last):
2025-12-04T13:38:32.0162443Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0162490Z     getattr(self, test_name)()
2025-12-04T13:38:32.0162652Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0162690Z     fn()
2025-12-04T13:38:32.0162846Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0162887Z     method(*args, **kwargs)
2025-12-04T13:38:32.0163044Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0163085Z     method(*args, **kwargs)
2025-12-04T13:38:32.0163255Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0163295Z     with policy():
2025-12-04T13:38:32.0163451Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0163493Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0163845Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 0. CUDA driver allocated memory was 2453667840 and is now 3990880256.
2025-12-04T13:38:32.0163848Z 
2025-12-04T13:38:32.0163924Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0164174Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda
2025-12-04T13:38:32.0164178Z 
2025-12-04T13:38:32.0164267Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0164272Z 
2025-12-04T13:38:32.0164332Z Process 1 exited with error code 10 and exception:
2025-12-04T13:38:32.0164381Z Traceback (most recent call last):
2025-12-04T13:38:32.0164544Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0164602Z     getattr(self, test_name)()
2025-12-04T13:38:32.0164760Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0164799Z     fn()
2025-12-04T13:38:32.0164954Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0165009Z     method(*args, **kwargs)
2025-12-04T13:38:32.0165163Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0165206Z     method(*args, **kwargs)
2025-12-04T13:38:32.0165358Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0165400Z     with policy():
2025-12-04T13:38:32.0165553Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0165599Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0165948Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 1. CUDA driver allocated memory was 2317352960 and is now 3854565376.
2025-12-04T13:38:32.0165952Z 
2025-12-04T13:38:32.0166035Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0166266Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda
2025-12-04T13:38:32.0166272Z 
2025-12-04T13:38:32.0166359Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0166361Z 
2025-12-04T13:38:32.0166363Z 
2025-12-04T13:38:32.0166444Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.0166534Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.0166771Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-4b715ed61ff2d6c7.xml -
2025-12-04T13:38:32.0166832Z =========================== short test summary info ============================
2025-12-04T13:38:32.0167095Z FAILED [39.1581s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_no_shard_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.0167284Z Traceback (most recent call last):
2025-12-04T13:38:32.0167453Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0167495Z     getattr(self, test_name)()
2025-12-04T13:38:32.0167660Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0167697Z     fn()
2025-12-04T13:38:32.0167853Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0167895Z     method(*args, **kwargs)
2025-12-04T13:38:32.0168062Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0168109Z     method(*args, **kwargs)
2025-12-04T13:38:32.0168261Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0168304Z     with policy():
2025-12-04T13:38:32.0168459Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0168503Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0168856Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 0. CUDA driver allocated memory was 2453667840 and is now 3990880256.
2025-12-04T13:38:32.0168869Z 
2025-12-04T13:38:32.0168946Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0169179Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda
2025-12-04T13:38:32.0169191Z 
2025-12-04T13:38:32.0169281Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0169283Z 
2025-12-04T13:38:32.0169342Z Process 1 exited with error code 10 and exception:
2025-12-04T13:38:32.0169391Z Traceback (most recent call last):
2025-12-04T13:38:32.0169554Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0169638Z     getattr(self, test_name)()
2025-12-04T13:38:32.0169803Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0169838Z     fn()
2025-12-04T13:38:32.0169994Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0170035Z     method(*args, **kwargs)
2025-12-04T13:38:32.0170193Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0170233Z     method(*args, **kwargs)
2025-12-04T13:38:32.0170389Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0170426Z     with policy():
2025-12-04T13:38:32.0170580Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0170622Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0170973Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 1. CUDA driver allocated memory was 2317352960 and is now 3854565376.
2025-12-04T13:38:32.0170976Z 
2025-12-04T13:38:32.0171049Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0171297Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda
2025-12-04T13:38:32.0171299Z 
2025-12-04T13:38:32.0171387Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0171458Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.0171524Z ======================= 1 failed, 5 deselected in 39.32s =======================
2025-12-04T13:38:32.0171562Z Got exit code 1
2025-12-04T13:38:32.0171606Z Retrying single test...
2025-12-04T13:38:32.0171795Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-69f8ccb79d38e88f.xml
2025-12-04T13:38:32.0171870Z ============================= test session starts ==============================
2025-12-04T13:38:32.0171988Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.0172032Z cachedir: .pytest_cache
2025-12-04T13:38:32.0172194Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.0172244Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.0172285Z configfile: pytest.ini
2025-12-04T13:38:32.0172451Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.0172539Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.0172764Z stepcurrent: skipping 5 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_no_shard_cuda
2025-12-04T13:38:32.0172808Z Running 1 items in this shard
2025-12-04T13:38:32.0172823Z 
2025-12-04T13:38:32.0173134Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_no_shard_cuda I1204 13:01:32.158000 383703 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 383772
2025-12-04T13:38:32.0173290Z I1204 13:01:32.159000 383703 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 383773
2025-12-04T13:38:32.0173446Z I1204 13:01:32.159000 383703 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 383774
2025-12-04T13:38:32.0173603Z I1204 13:01:32.160000 383703 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 383775
2025-12-04T13:38:32.0173896Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0173952Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.0174530Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0174572Z   _warn_cpu_init()
2025-12-04T13:38:32.0174861Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0174915Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.0175498Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0175537Z   _warn_cpu_init()
2025-12-04T13:38:32.0175827Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0175878Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.0176177Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0176226Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.0176801Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0176859Z   _warn_cpu_init()
2025-12-04T13:38:32.0177428Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0177481Z   _warn_cpu_init()
2025-12-04T13:38:32.0177769Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0177849Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.0178145Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0178225Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.0178515Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0178589Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.0178876Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0178948Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.0179243Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.0179287Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0179519Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.0179562Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0179840Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.0179883Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0180108Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.0180149Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0180373Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.0180414Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0180652Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.0180697Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0180917Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.0180961Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0181177Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.0181231Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0181452Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.0181493Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0181639Z [rank3]:E1204 13:02:09.629000 383775 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0181818Z [rank3]:E1204 13:02:09.629000 383775 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0182110Z [rank3]:E1204 13:02:09.629000 383775 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0182266Z [rank3]:E1204 13:02:09.629000 383775 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0182557Z [rank3]:E1204 13:02:09.629000 383775 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0182682Z [rank3]:E1204 13:02:09.629000 383775 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0182966Z [rank3]:E1204 13:02:09.629000 383775 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0183118Z [rank3]:E1204 13:02:09.629000 383775 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0183398Z [rank3]:E1204 13:02:09.629000 383775 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0183546Z [rank3]:E1204 13:02:09.629000 383775 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0183833Z [rank3]:E1204 13:02:09.629000 383775 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0183973Z [rank3]:E1204 13:02:09.629000 383775 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0184249Z [rank3]:E1204 13:02:09.629000 383775 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0184398Z [rank3]:E1204 13:02:09.629000 383775 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0184884Z [rank3]:E1204 13:02:09.629000 383775 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 3. CUDA driver allocated memory was 2250244096 and is now 3787456512.
2025-12-04T13:38:32.0185005Z [rank3]:E1204 13:02:09.629000 383775 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0185200Z [rank3]:E1204 13:02:09.629000 383775 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0185564Z [rank3]:E1204 13:02:09.629000 383775 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda
2025-12-04T13:38:32.0185720Z [rank3]:E1204 13:02:09.629000 383775 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0185932Z [rank3]:E1204 13:02:09.629000 383775 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0186112Z [rank3]:E1204 13:02:09.629000 383775 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.0186151Z dist init r=3, world=4
2025-12-04T13:38:32.0186290Z [rank2]:E1204 13:02:09.668000 383774 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0186448Z [rank2]:E1204 13:02:09.668000 383774 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0186738Z [rank2]:E1204 13:02:09.668000 383774 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0186891Z [rank2]:E1204 13:02:09.668000 383774 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0187181Z [rank2]:E1204 13:02:09.668000 383774 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0187306Z [rank2]:E1204 13:02:09.668000 383774 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0187583Z [rank2]:E1204 13:02:09.668000 383774 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0187734Z [rank2]:E1204 13:02:09.668000 383774 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0188010Z [rank2]:E1204 13:02:09.668000 383774 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0188168Z [rank2]:E1204 13:02:09.668000 383774 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0188445Z [rank2]:E1204 13:02:09.668000 383774 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0188583Z [rank2]:E1204 13:02:09.668000 383774 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0188862Z [rank2]:E1204 13:02:09.668000 383774 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0189008Z [rank2]:E1204 13:02:09.668000 383774 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0189494Z [rank2]:E1204 13:02:09.668000 383774 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 2. CUDA driver allocated memory was 2300575744 and is now 3837788160.
2025-12-04T13:38:32.0189642Z [rank2]:E1204 13:02:09.668000 383774 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0189841Z [rank2]:E1204 13:02:09.668000 383774 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0190218Z [rank2]:E1204 13:02:09.668000 383774 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda
2025-12-04T13:38:32.0190343Z [rank2]:E1204 13:02:09.668000 383774 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0190554Z [rank2]:E1204 13:02:09.668000 383774 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0190717Z [rank2]:E1204 13:02:09.668000 383774 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.0190757Z dist init r=2, world=4
2025-12-04T13:38:32.0190894Z [rank1]:E1204 13:02:09.670000 383773 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0191055Z [rank1]:E1204 13:02:09.670000 383773 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0191342Z [rank1]:E1204 13:02:09.670000 383773 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0191497Z [rank1]:E1204 13:02:09.670000 383773 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0191782Z [rank1]:E1204 13:02:09.670000 383773 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0191904Z [rank1]:E1204 13:02:09.670000 383773 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0192183Z [rank1]:E1204 13:02:09.670000 383773 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0192331Z [rank1]:E1204 13:02:09.670000 383773 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0192622Z [rank1]:E1204 13:02:09.670000 383773 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0192769Z [rank1]:E1204 13:02:09.670000 383773 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0193046Z [rank1]:E1204 13:02:09.670000 383773 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0193182Z [rank1]:E1204 13:02:09.670000 383773 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0193483Z [rank1]:E1204 13:02:09.670000 383773 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0193633Z [rank1]:E1204 13:02:09.670000 383773 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0194106Z [rank1]:E1204 13:02:09.670000 383773 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 1. CUDA driver allocated memory was 2317352960 and is now 3854565376.
2025-12-04T13:38:32.0194231Z [rank1]:E1204 13:02:09.670000 383773 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0194426Z [rank1]:E1204 13:02:09.670000 383773 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0194795Z [rank1]:E1204 13:02:09.670000 383773 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda
2025-12-04T13:38:32.0194909Z [rank1]:E1204 13:02:09.670000 383773 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0195121Z [rank1]:E1204 13:02:09.670000 383773 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0195286Z [rank1]:E1204 13:02:09.670000 383773 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.0195324Z dist init r=1, world=4
2025-12-04T13:38:32.0195463Z [rank0]:E1204 13:02:09.683000 383772 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0195624Z [rank0]:E1204 13:02:09.683000 383772 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0195913Z [rank0]:E1204 13:02:09.683000 383772 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0196067Z [rank0]:E1204 13:02:09.683000 383772 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0196355Z [rank0]:E1204 13:02:09.683000 383772 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0196480Z [rank0]:E1204 13:02:09.683000 383772 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0196769Z [rank0]:E1204 13:02:09.683000 383772 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0196916Z [rank0]:E1204 13:02:09.683000 383772 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0197191Z [rank0]:E1204 13:02:09.683000 383772 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0197341Z [rank0]:E1204 13:02:09.683000 383772 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0197628Z [rank0]:E1204 13:02:09.683000 383772 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0197767Z [rank0]:E1204 13:02:09.683000 383772 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0198044Z [rank0]:E1204 13:02:09.683000 383772 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0198195Z [rank0]:E1204 13:02:09.683000 383772 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0198679Z [rank0]:E1204 13:02:09.683000 383772 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 0. CUDA driver allocated memory was 2453667840 and is now 3990880256.
2025-12-04T13:38:32.0198804Z [rank0]:E1204 13:02:09.683000 383772 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0199002Z [rank0]:E1204 13:02:09.683000 383772 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0199356Z [rank0]:E1204 13:02:09.683000 383772 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda
2025-12-04T13:38:32.0199470Z [rank0]:E1204 13:02:09.683000 383772 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0199719Z [rank0]:E1204 13:02:09.683000 383772 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0199885Z [rank0]:E1204 13:02:09.683000 383772 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.0199925Z dist init r=0, world=4
2025-12-04T13:38:32.0200265Z [rank0]:[W1204 13:02:09.961694222 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.0200307Z FAILED [39.4593s] [100%]
2025-12-04T13:38:32.0200309Z 
2025-12-04T13:38:32.0200365Z =================================== FAILURES ===================================
2025-12-04T13:38:32.0200466Z ___ TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda ___
2025-12-04T13:38:32.0200512Z Traceback (most recent call last):
2025-12-04T13:38:32.0200679Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.0200723Z     self._join_processes(fn)
2025-12-04T13:38:32.0200913Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.0200968Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.0201148Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.0201191Z     raise RuntimeError(error)
2025-12-04T13:38:32.0201272Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.0201318Z Traceback (most recent call last):
2025-12-04T13:38:32.0201481Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0201525Z     getattr(self, test_name)()
2025-12-04T13:38:32.0201696Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0201734Z     fn()
2025-12-04T13:38:32.0201887Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0201930Z     method(*args, **kwargs)
2025-12-04T13:38:32.0202080Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0202123Z     method(*args, **kwargs)
2025-12-04T13:38:32.0202273Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0202326Z     with policy():
2025-12-04T13:38:32.0202479Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0202523Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0202875Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 3. CUDA driver allocated memory was 2250244096 and is now 3787456512.
2025-12-04T13:38:32.0202894Z 
2025-12-04T13:38:32.0202973Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0203203Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda
2025-12-04T13:38:32.0203207Z 
2025-12-04T13:38:32.0203294Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0203297Z 
2025-12-04T13:38:32.0203299Z 
2025-12-04T13:38:32.0203375Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.0203465Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.0203700Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-69f8ccb79d38e88f.xml -
2025-12-04T13:38:32.0203761Z =========================== short test summary info ============================
2025-12-04T13:38:32.0204011Z FAILED [39.4593s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_no_shard_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.0204056Z Traceback (most recent call last):
2025-12-04T13:38:32.0204222Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0204265Z     getattr(self, test_name)()
2025-12-04T13:38:32.0204428Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0204462Z     fn()
2025-12-04T13:38:32.0204618Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0204668Z     method(*args, **kwargs)
2025-12-04T13:38:32.0204825Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0204864Z     method(*args, **kwargs)
2025-12-04T13:38:32.0205015Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0205051Z     with policy():
2025-12-04T13:38:32.0205205Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0205249Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0205607Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 3. CUDA driver allocated memory was 2250244096 and is now 3787456512.
2025-12-04T13:38:32.0205610Z 
2025-12-04T13:38:32.0205687Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0205919Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda
2025-12-04T13:38:32.0205921Z 
2025-12-04T13:38:32.0206009Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0206072Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.0206147Z ====================== 1 failed, 32 deselected in 39.62s =======================
2025-12-04T13:38:32.0206183Z Got exit code 1
2025-12-04T13:38:32.0206225Z Retrying single test...
2025-12-04T13:38:32.0206413Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0905abac027446cf.xml
2025-12-04T13:38:32.0206485Z ============================= test session starts ==============================
2025-12-04T13:38:32.0206599Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.0206642Z cachedir: .pytest_cache
2025-12-04T13:38:32.0206801Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.0206848Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.0206888Z configfile: pytest.ini
2025-12-04T13:38:32.0207053Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.0207129Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.0207354Z stepcurrent: skipping 5 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_no_shard_cuda
2025-12-04T13:38:32.0207401Z Running 1 items in this shard
2025-12-04T13:38:32.0207403Z 
2025-12-04T13:38:32.0207712Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_no_shard_cuda I1204 13:02:14.070000 384105 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 384174
2025-12-04T13:38:32.0207869Z I1204 13:02:14.071000 384105 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 384175
2025-12-04T13:38:32.0208020Z I1204 13:02:14.071000 384105 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 384176
2025-12-04T13:38:32.0208176Z I1204 13:02:14.072000 384105 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 384177
2025-12-04T13:38:32.0208469Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0208523Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.0209121Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0209160Z   _warn_cpu_init()
2025-12-04T13:38:32.0209450Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0209498Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.0210143Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0210182Z   _warn_cpu_init()
2025-12-04T13:38:32.0210465Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0210531Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.0210815Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0210877Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.0211455Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
﻿2025-12-04T13:38:32.0213977Z   _warn_cpu_init()
2025-12-04T13:38:32.0214548Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0214588Z   _warn_cpu_init()
2025-12-04T13:38:32.0214875Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0214953Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.0215240Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0215332Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.0215640Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0215713Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.0216000Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0216071Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.0216364Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.0216409Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0216653Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.0216696Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0216922Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.0216966Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0217185Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.0217239Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0217460Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.0217502Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0217725Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.0217766Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0217985Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.0218027Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0218246Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.0218364Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0218580Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.0218622Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0218769Z [rank3]:E1204 13:02:51.310000 384177 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0218935Z [rank3]:E1204 13:02:51.310000 384177 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0219229Z [rank3]:E1204 13:02:51.310000 384177 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0219388Z [rank3]:E1204 13:02:51.310000 384177 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0219722Z [rank3]:E1204 13:02:51.310000 384177 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0219862Z [rank3]:E1204 13:02:51.310000 384177 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0220144Z [rank3]:E1204 13:02:51.310000 384177 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0220292Z [rank3]:E1204 13:02:51.310000 384177 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0220572Z [rank3]:E1204 13:02:51.310000 384177 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0220722Z [rank3]:E1204 13:02:51.310000 384177 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0221012Z [rank3]:E1204 13:02:51.310000 384177 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0221150Z [rank3]:E1204 13:02:51.310000 384177 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0221426Z [rank3]:E1204 13:02:51.310000 384177 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0221590Z [rank3]:E1204 13:02:51.310000 384177 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0222072Z [rank3]:E1204 13:02:51.310000 384177 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 3. CUDA driver allocated memory was 2250244096 and is now 3787456512.
2025-12-04T13:38:32.0222191Z [rank3]:E1204 13:02:51.310000 384177 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0222390Z [rank3]:E1204 13:02:51.310000 384177 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0222748Z [rank3]:E1204 13:02:51.310000 384177 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda
2025-12-04T13:38:32.0222892Z [rank3]:E1204 13:02:51.310000 384177 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0223105Z [rank3]:E1204 13:02:51.310000 384177 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0223273Z [rank3]:E1204 13:02:51.310000 384177 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.0223313Z dist init r=3, world=4
2025-12-04T13:38:32.0223453Z [rank2]:E1204 13:02:51.320000 384176 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0223611Z [rank2]:E1204 13:02:51.320000 384176 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0223902Z [rank2]:E1204 13:02:51.320000 384176 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0224057Z [rank2]:E1204 13:02:51.320000 384176 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0224353Z [rank2]:E1204 13:02:51.320000 384176 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0224480Z [rank2]:E1204 13:02:51.320000 384176 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0224757Z [rank2]:E1204 13:02:51.320000 384176 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0224908Z [rank2]:E1204 13:02:51.320000 384176 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0225196Z [rank2]:E1204 13:02:51.320000 384176 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0225345Z [rank2]:E1204 13:02:51.320000 384176 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0225623Z [rank2]:E1204 13:02:51.320000 384176 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0225759Z [rank2]:E1204 13:02:51.320000 384176 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0226056Z [rank2]:E1204 13:02:51.320000 384176 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0226206Z [rank2]:E1204 13:02:51.320000 384176 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0226687Z [rank2]:E1204 13:02:51.320000 384176 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 2. CUDA driver allocated memory was 2300575744 and is now 3837788160.
2025-12-04T13:38:32.0226803Z [rank2]:E1204 13:02:51.320000 384176 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0227000Z [rank2]:E1204 13:02:51.320000 384176 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0227369Z [rank2]:E1204 13:02:51.320000 384176 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda
2025-12-04T13:38:32.0227482Z [rank2]:E1204 13:02:51.320000 384176 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0227695Z [rank2]:E1204 13:02:51.320000 384176 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0227858Z [rank2]:E1204 13:02:51.320000 384176 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.0227899Z dist init r=2, world=4
2025-12-04T13:38:32.0228034Z [rank0]:E1204 13:02:51.361000 384174 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0228196Z [rank0]:E1204 13:02:51.361000 384174 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0228493Z [rank0]:E1204 13:02:51.361000 384174 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0228646Z [rank0]:E1204 13:02:51.361000 384174 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0228932Z [rank0]:E1204 13:02:51.361000 384174 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0229057Z [rank0]:E1204 13:02:51.361000 384174 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0229335Z [rank0]:E1204 13:02:51.361000 384174 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0229491Z [rank0]:E1204 13:02:51.361000 384174 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0229897Z [rank0]:E1204 13:02:51.361000 384174 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0230042Z [rank0]:E1204 13:02:51.361000 384174 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0230320Z [rank0]:E1204 13:02:51.361000 384174 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0230472Z [rank0]:E1204 13:02:51.361000 384174 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0230752Z [rank0]:E1204 13:02:51.361000 384174 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0230902Z [rank0]:E1204 13:02:51.361000 384174 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0231375Z [rank0]:E1204 13:02:51.361000 384174 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 0. CUDA driver allocated memory was 2453667840 and is now 3990880256.
2025-12-04T13:38:32.0231505Z [rank0]:E1204 13:02:51.361000 384174 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0231702Z [rank0]:E1204 13:02:51.361000 384174 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0232062Z [rank0]:E1204 13:02:51.361000 384174 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda
2025-12-04T13:38:32.0232177Z [rank0]:E1204 13:02:51.361000 384174 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0232388Z [rank0]:E1204 13:02:51.361000 384174 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0232554Z [rank0]:E1204 13:02:51.361000 384174 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.0232592Z dist init r=0, world=4
2025-12-04T13:38:32.0232730Z [rank1]:E1204 13:02:51.371000 384175 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0232903Z [rank1]:E1204 13:02:51.371000 384175 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0233189Z [rank1]:E1204 13:02:51.371000 384175 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0233342Z [rank1]:E1204 13:02:51.371000 384175 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0233627Z [rank1]:E1204 13:02:51.371000 384175 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0233765Z [rank1]:E1204 13:02:51.371000 384175 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0234044Z [rank1]:E1204 13:02:51.371000 384175 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0234193Z [rank1]:E1204 13:02:51.371000 384175 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0234468Z [rank1]:E1204 13:02:51.371000 384175 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0234626Z [rank1]:E1204 13:02:51.371000 384175 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0234904Z [rank1]:E1204 13:02:51.371000 384175 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0235040Z [rank1]:E1204 13:02:51.371000 384175 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0235318Z [rank1]:E1204 13:02:51.371000 384175 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0235465Z [rank1]:E1204 13:02:51.371000 384175 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0235952Z [rank1]:E1204 13:02:51.371000 384175 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 1. CUDA driver allocated memory was 2317352960 and is now 3854565376.
2025-12-04T13:38:32.0236069Z [rank1]:E1204 13:02:51.371000 384175 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0236267Z [rank1]:E1204 13:02:51.371000 384175 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0236623Z [rank1]:E1204 13:02:51.371000 384175 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda
2025-12-04T13:38:32.0236736Z [rank1]:E1204 13:02:51.371000 384175 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0236948Z [rank1]:E1204 13:02:51.371000 384175 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0237121Z [rank1]:E1204 13:02:51.371000 384175 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.0237161Z dist init r=1, world=4
2025-12-04T13:38:32.0237495Z [rank0]:[W1204 13:02:51.636590405 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.0237538Z FAILED [38.9577s] [100%]
2025-12-04T13:38:32.0237541Z 
2025-12-04T13:38:32.0237597Z =================================== FAILURES ===================================
2025-12-04T13:38:32.0237700Z ___ TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda ___
2025-12-04T13:38:32.0237747Z Traceback (most recent call last):
2025-12-04T13:38:32.0237923Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.0237967Z     self._join_processes(fn)
2025-12-04T13:38:32.0238140Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.0238198Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.0238377Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.0238422Z     raise RuntimeError(error)
2025-12-04T13:38:32.0238512Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.0238560Z Traceback (most recent call last):
2025-12-04T13:38:32.0238721Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0238765Z     getattr(self, test_name)()
2025-12-04T13:38:32.0238923Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0238961Z     fn()
2025-12-04T13:38:32.0239112Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0239154Z     method(*args, **kwargs)
2025-12-04T13:38:32.0239427Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0239470Z     method(*args, **kwargs)
2025-12-04T13:38:32.0239646Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0239705Z     with policy():
2025-12-04T13:38:32.0239858Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0239902Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0240253Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 3. CUDA driver allocated memory was 2250244096 and is now 3787456512.
2025-12-04T13:38:32.0240258Z 
2025-12-04T13:38:32.0240333Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0240568Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda
2025-12-04T13:38:32.0240571Z 
2025-12-04T13:38:32.0240658Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0240661Z 
2025-12-04T13:38:32.0240663Z 
2025-12-04T13:38:32.0240741Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.0240828Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.0241074Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0905abac027446cf.xml -
2025-12-04T13:38:32.0241135Z =========================== short test summary info ============================
2025-12-04T13:38:32.0241387Z FAILED [38.9577s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_no_shard_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.0241433Z Traceback (most recent call last):
2025-12-04T13:38:32.0241599Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0241644Z     getattr(self, test_name)()
2025-12-04T13:38:32.0241806Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0241842Z     fn()
2025-12-04T13:38:32.0242013Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0242055Z     method(*args, **kwargs)
2025-12-04T13:38:32.0242208Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0242249Z     method(*args, **kwargs)
2025-12-04T13:38:32.0242400Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0242438Z     with policy():
2025-12-04T13:38:32.0242604Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0242647Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0242995Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 3. CUDA driver allocated memory was 2250244096 and is now 3787456512.
2025-12-04T13:38:32.0243000Z 
2025-12-04T13:38:32.0243076Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0243306Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_no_shard_cuda
2025-12-04T13:38:32.0243308Z 
2025-12-04T13:38:32.0243395Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0243459Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.0243538Z ====================== 1 failed, 32 deselected in 39.12s =======================
2025-12-04T13:38:32.0243576Z Got exit code 1
2025-12-04T13:38:32.0243758Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_no_shard_cuda
2025-12-04T13:38:32.0243888Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T13:38:32.0244077Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-7d6773e1cbecc3a5.xml
2025-12-04T13:38:32.0244137Z ============================= test session starts ==============================
2025-12-04T13:38:32.0244248Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.0244294Z cachedir: .pytest_cache
2025-12-04T13:38:32.0244452Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.0244501Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.0244541Z configfile: pytest.ini
2025-12-04T13:38:32.0244705Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.0244780Z collecting ... collected 60 items / 6 deselected / 54 selected
2025-12-04T13:38:32.0244844Z stepcurrent: skipping 6 already run items.
2025-12-04T13:38:32.0244887Z Running 27 items in this shard
2025-12-04T13:38:32.0244889Z 
2025-12-04T13:38:32.0245206Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_shard_grad_op_cuda I1204 13:02:55.729000 384507 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 384576
2025-12-04T13:38:32.0245363Z I1204 13:02:55.730000 384507 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 384577
2025-12-04T13:38:32.0245521Z I1204 13:02:55.730000 384507 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 384578
2025-12-04T13:38:32.0245673Z I1204 13:02:55.731000 384507 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 384579
2025-12-04T13:38:32.0246267Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0246307Z   _warn_cpu_init()
2025-12-04T13:38:32.0246881Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0246931Z   _warn_cpu_init()
2025-12-04T13:38:32.0247501Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0247538Z   _warn_cpu_init()
2025-12-04T13:38:32.0248119Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0248169Z   _warn_cpu_init()
2025-12-04T13:38:32.0248461Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.0248503Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0248646Z [rank0]:E1204 13:03:33.298000 384576 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0248810Z [rank0]:E1204 13:03:33.298000 384576 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0249101Z [rank0]:E1204 13:03:33.298000 384576 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0249271Z [rank0]:E1204 13:03:33.298000 384576 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0249555Z [rank0]:E1204 13:03:33.298000 384576 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0249720Z [rank0]:E1204 13:03:33.298000 384576 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0249999Z [rank0]:E1204 13:03:33.298000 384576 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0250152Z [rank0]:E1204 13:03:33.298000 384576 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0250445Z [rank0]:E1204 13:03:33.298000 384576 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0250592Z [rank0]:E1204 13:03:33.298000 384576 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0250871Z [rank0]:E1204 13:03:33.298000 384576 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0251019Z [rank0]:E1204 13:03:33.298000 384576 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0251301Z [rank0]:E1204 13:03:33.298000 384576 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0251450Z [rank0]:E1204 13:03:33.298000 384576 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0251938Z [rank0]:E1204 13:03:33.298000 384576 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 2453667840 and is now 3965714432.
2025-12-04T13:38:32.0252056Z [rank0]:E1204 13:03:33.298000 384576 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0252265Z [rank0]:E1204 13:03:33.298000 384576 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0252631Z [rank0]:E1204 13:03:33.298000 384576 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.0252743Z [rank0]:E1204 13:03:33.298000 384576 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0252958Z [rank0]:E1204 13:03:33.298000 384576 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0253122Z [rank0]:E1204 13:03:33.298000 384576 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.0253165Z dist init r=0, world=4
2025-12-04T13:38:32.0253301Z [rank3]:E1204 13:03:33.300000 384579 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0253463Z [rank3]:E1204 13:03:33.300000 384579 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0253766Z [rank3]:E1204 13:03:33.300000 384579 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0253919Z [rank3]:E1204 13:03:33.300000 384579 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0254205Z [rank3]:E1204 13:03:33.300000 384579 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0254330Z [rank3]:E1204 13:03:33.300000 384579 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0254621Z [rank3]:E1204 13:03:33.300000 384579 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0254768Z [rank3]:E1204 13:03:33.300000 384579 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0255046Z [rank3]:E1204 13:03:33.300000 384579 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0255193Z [rank3]:E1204 13:03:33.300000 384579 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0255483Z [rank3]:E1204 13:03:33.300000 384579 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0255622Z [rank3]:E1204 13:03:33.300000 384579 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0255900Z [rank3]:E1204 13:03:33.300000 384579 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0256049Z [rank3]:E1204 13:03:33.300000 384579 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0256526Z [rank3]:E1204 13:03:33.300000 384579 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 2250244096 and is now 3762290688.
2025-12-04T13:38:32.0256654Z [rank3]:E1204 13:03:33.300000 384579 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0256850Z [rank3]:E1204 13:03:33.300000 384579 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0257212Z [rank3]:E1204 13:03:33.300000 384579 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.0257324Z [rank3]:E1204 13:03:33.300000 384579 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0257536Z [rank3]:E1204 13:03:33.300000 384579 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0257703Z [rank3]:E1204 13:03:33.300000 384579 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.0257857Z [rank2]:E1204 13:03:33.300000 384578 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0258020Z [rank2]:E1204 13:03:33.300000 384578 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0258308Z [rank2]:E1204 13:03:33.300000 384578 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0258461Z [rank2]:E1204 13:03:33.300000 384578 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0258759Z [rank2]:E1204 13:03:33.300000 384578 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0258882Z [rank2]:E1204 13:03:33.300000 384578 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0259161Z [rank2]:E1204 13:03:33.300000 384578 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0259311Z [rank2]:E1204 13:03:33.300000 384578 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0259618Z [rank2]:E1204 13:03:33.300000 384578 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0259779Z [rank2]:E1204 13:03:33.300000 384578 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0260056Z [rank2]:E1204 13:03:33.300000 384578 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0260192Z [rank2]:E1204 13:03:33.300000 384578 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0260468Z [rank2]:E1204 13:03:33.300000 384578 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0260620Z [rank2]:E1204 13:03:33.300000 384578 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0261112Z [rank2]:E1204 13:03:33.300000 384578 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 2300575744 and is now 3812622336.
2025-12-04T13:38:32.0261228Z [rank2]:E1204 13:03:33.300000 384578 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0261424Z [rank2]:E1204 13:03:33.300000 384578 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0261786Z [rank2]:E1204 13:03:33.300000 384578 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.0261902Z [rank2]:E1204 13:03:33.300000 384578 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0262127Z [rank2]:E1204 13:03:33.300000 384578 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0262291Z [rank2]:E1204 13:03:33.300000 384578 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.0262329Z dist init r=3, world=4
2025-12-04T13:38:32.0262370Z dist init r=2, world=4
2025-12-04T13:38:32.0262506Z [rank1]:E1204 13:03:33.309000 384577 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0262668Z [rank1]:E1204 13:03:33.309000 384577 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0262958Z [rank1]:E1204 13:03:33.309000 384577 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0263128Z [rank1]:E1204 13:03:33.309000 384577 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0263414Z [rank1]:E1204 13:03:33.309000 384577 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0263535Z [rank1]:E1204 13:03:33.309000 384577 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0263812Z [rank1]:E1204 13:03:33.309000 384577 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0263972Z [rank1]:E1204 13:03:33.309000 384577 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0264250Z [rank1]:E1204 13:03:33.309000 384577 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0264396Z [rank1]:E1204 13:03:33.309000 384577 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0264672Z [rank1]:E1204 13:03:33.309000 384577 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0264811Z [rank1]:E1204 13:03:33.309000 384577 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0265098Z [rank1]:E1204 13:03:33.309000 384577 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0265248Z [rank1]:E1204 13:03:33.309000 384577 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0265726Z [rank1]:E1204 13:03:33.309000 384577 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 2317352960 and is now 3829399552.
2025-12-04T13:38:32.0265841Z [rank1]:E1204 13:03:33.309000 384577 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0266040Z [rank1]:E1204 13:03:33.309000 384577 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0266413Z [rank1]:E1204 13:03:33.309000 384577 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.0266528Z [rank1]:E1204 13:03:33.309000 384577 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0266737Z [rank1]:E1204 13:03:33.309000 384577 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0266902Z [rank1]:E1204 13:03:33.309000 384577 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.0266942Z dist init r=1, world=4
2025-12-04T13:38:32.0267287Z [rank0]:[W1204 13:03:33.560839773 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.0267327Z FAILED [39.5588s] [  3%]
2025-12-04T13:38:32.0267329Z 
2025-12-04T13:38:32.0267387Z =================================== FAILURES ===================================
2025-12-04T13:38:32.0267491Z _ TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda _
2025-12-04T13:38:32.0267540Z Traceback (most recent call last):
2025-12-04T13:38:32.0267704Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.0267747Z     self._join_processes(fn)
2025-12-04T13:38:32.0267932Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.0267985Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.0268165Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.0268209Z     raise RuntimeError(error)
2025-12-04T13:38:32.0268292Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.0268337Z Traceback (most recent call last):
2025-12-04T13:38:32.0268501Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0268543Z     getattr(self, test_name)()
2025-12-04T13:38:32.0268703Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0268737Z     fn()
2025-12-04T13:38:32.0268893Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0268945Z     method(*args, **kwargs)
2025-12-04T13:38:32.0269098Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0269139Z     method(*args, **kwargs)
2025-12-04T13:38:32.0269293Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0269331Z     with policy():
2025-12-04T13:38:32.0269483Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0269525Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0269906Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 2453667840 and is now 3965714432.
2025-12-04T13:38:32.0269910Z 
2025-12-04T13:38:32.0269986Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0270222Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.0270225Z 
2025-12-04T13:38:32.0270326Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0270329Z 
2025-12-04T13:38:32.0270330Z 
2025-12-04T13:38:32.0270406Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.0270496Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.0270731Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-7d6773e1cbecc3a5.xml -
2025-12-04T13:38:32.0270796Z =========================== short test summary info ============================
2025-12-04T13:38:32.0271052Z FAILED [39.5588s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_shard_grad_op_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.0271112Z Traceback (most recent call last):
2025-12-04T13:38:32.0271279Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0271321Z     getattr(self, test_name)()
2025-12-04T13:38:32.0271483Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0271517Z     fn()
2025-12-04T13:38:32.0271671Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0271723Z     method(*args, **kwargs)
2025-12-04T13:38:32.0271878Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0271917Z     method(*args, **kwargs)
2025-12-04T13:38:32.0272072Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0272108Z     with policy():
2025-12-04T13:38:32.0272262Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0272301Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0272658Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 2453667840 and is now 3965714432.
2025-12-04T13:38:32.0272661Z 
2025-12-04T13:38:32.0272734Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0272993Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.0272995Z 
2025-12-04T13:38:32.0273083Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0273147Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.0273211Z ======================= 1 failed, 6 deselected in 39.72s =======================
2025-12-04T13:38:32.0273247Z Got exit code 1
2025-12-04T13:38:32.0273289Z Retrying single test...
2025-12-04T13:38:32.0273479Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-159aa8b84e6f8c02.xml
2025-12-04T13:38:32.0273538Z ============================= test session starts ==============================
2025-12-04T13:38:32.0273651Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.0273693Z cachedir: .pytest_cache
2025-12-04T13:38:32.0273852Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.0273900Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.0273940Z configfile: pytest.ini
2025-12-04T13:38:32.0274115Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.0274189Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.0274419Z stepcurrent: skipping 6 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.0274462Z Running 1 items in this shard
2025-12-04T13:38:32.0274465Z 
2025-12-04T13:38:32.0274780Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_shard_grad_op_cuda I1204 13:03:37.835000 384909 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 384978
2025-12-04T13:38:32.0274946Z I1204 13:03:37.836000 384909 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 384979
2025-12-04T13:38:32.0275102Z I1204 13:03:37.836000 384909 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 384980
2025-12-04T13:38:32.0275254Z I1204 13:03:37.837000 384909 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 384981
2025-12-04T13:38:32.0275835Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0275888Z   _warn_cpu_init()
2025-12-04T13:38:32.0276453Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0276492Z   _warn_cpu_init()
2025-12-04T13:38:32.0277058Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0277106Z   _warn_cpu_init()
2025-12-04T13:38:32.0277668Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0277704Z   _warn_cpu_init()
2025-12-04T13:38:32.0278000Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.0278046Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0278187Z [rank3]:E1204 13:04:15.502000 384981 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0278360Z [rank3]:E1204 13:04:15.502000 384981 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0278651Z [rank3]:E1204 13:04:15.502000 384981 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0278807Z [rank3]:E1204 13:04:15.502000 384981 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0279093Z [rank3]:E1204 13:04:15.502000 384981 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0279221Z [rank3]:E1204 13:04:15.502000 384981 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0279511Z [rank3]:E1204 13:04:15.502000 384981 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0279689Z [rank3]:E1204 13:04:15.502000 384981 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0280037Z [rank3]:E1204 13:04:15.502000 384981 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0280201Z [rank3]:E1204 13:04:15.502000 384981 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0280483Z [rank3]:E1204 13:04:15.502000 384981 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0280619Z [rank3]:E1204 13:04:15.502000 384981 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0280900Z [rank3]:E1204 13:04:15.502000 384981 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0281047Z [rank3]:E1204 13:04:15.502000 384981 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0281533Z [rank3]:E1204 13:04:15.502000 384981 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 2250244096 and is now 3762290688.
2025-12-04T13:38:32.0281665Z [rank3]:E1204 13:04:15.502000 384981 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0281861Z [rank3]:E1204 13:04:15.502000 384981 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0282225Z [rank3]:E1204 13:04:15.502000 384981 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.0282339Z [rank3]:E1204 13:04:15.502000 384981 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0282554Z [rank3]:E1204 13:04:15.502000 384981 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0282720Z [rank3]:E1204 13:04:15.502000 384981 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.0282772Z dist init r=3, world=4
2025-12-04T13:38:32.0282909Z [rank1]:E1204 13:04:15.556000 384979 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0283070Z [rank1]:E1204 13:04:15.556000 384979 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0283359Z [rank1]:E1204 13:04:15.556000 384979 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0283514Z [rank1]:E1204 13:04:15.556000 384979 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0283813Z [rank1]:E1204 13:04:15.556000 384979 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0283938Z [rank1]:E1204 13:04:15.556000 384979 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0284215Z [rank1]:E1204 13:04:15.556000 384979 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0284361Z [rank1]:E1204 13:04:15.556000 384979 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0284650Z [rank1]:E1204 13:04:15.556000 384979 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0284799Z [rank1]:E1204 13:04:15.556000 384979 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0285075Z [rank1]:E1204 13:04:15.556000 384979 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0285212Z [rank1]:E1204 13:04:15.556000 384979 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0285489Z [rank1]:E1204 13:04:15.556000 384979 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0285661Z [rank1]:E1204 13:04:15.556000 384979 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0286142Z [rank1]:E1204 13:04:15.556000 384979 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 2317352960 and is now 3829399552.
2025-12-04T13:38:32.0286257Z [rank1]:E1204 13:04:15.556000 384979 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0286453Z [rank1]:E1204 13:04:15.556000 384979 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0286815Z [rank1]:E1204 13:04:15.556000 384979 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.0286931Z [rank1]:E1204 13:04:15.556000 384979 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0287152Z [rank1]:E1204 13:04:15.556000 384979 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0287318Z [rank1]:E1204 13:04:15.556000 384979 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.0287356Z dist init r=1, world=4
2025-12-04T13:38:32.0287494Z [rank0]:E1204 13:04:15.567000 384978 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0287652Z [rank0]:E1204 13:04:15.567000 384978 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0287951Z [rank0]:E1204 13:04:15.567000 384978 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0288108Z [rank0]:E1204 13:04:15.567000 384978 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0288392Z [rank0]:E1204 13:04:15.567000 384978 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0288516Z [rank0]:E1204 13:04:15.567000 384978 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0288810Z [rank0]:E1204 13:04:15.567000 384978 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0288959Z [rank0]:E1204 13:04:15.567000 384978 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0289238Z [rank0]:E1204 13:04:15.567000 384978 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0289388Z [rank0]:E1204 13:04:15.567000 384978 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0289708Z [rank0]:E1204 13:04:15.567000 384978 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0289857Z [rank0]:E1204 13:04:15.567000 384978 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0290136Z [rank0]:E1204 13:04:15.567000 384978 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0290284Z [rank0]:E1204 13:04:15.567000 384978 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0290763Z [rank0]:E1204 13:04:15.567000 384978 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 2453667840 and is now 3965714432.
2025-12-04T13:38:32.0290878Z [rank0]:E1204 13:04:15.567000 384978 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0291075Z [rank0]:E1204 13:04:15.567000 384978 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0291449Z [rank0]:E1204 13:04:15.567000 384978 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.0291561Z [rank0]:E1204 13:04:15.567000 384978 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0291775Z [rank0]:E1204 13:04:15.567000 384978 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0291939Z [rank0]:E1204 13:04:15.567000 384978 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.0291980Z dist init r=0, world=4
2025-12-04T13:38:32.0292115Z [rank2]:E1204 13:04:15.570000 384980 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0292290Z [rank2]:E1204 13:04:15.570000 384980 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0292578Z [rank2]:E1204 13:04:15.570000 384980 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0292730Z [rank2]:E1204 13:04:15.570000 384980 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0293016Z [rank2]:E1204 13:04:15.570000 384980 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0293152Z [rank2]:E1204 13:04:15.570000 384980 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0293432Z [rank2]:E1204 13:04:15.570000 384980 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0293578Z [rank2]:E1204 13:04:15.570000 384980 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0293854Z [rank2]:E1204 13:04:15.570000 384980 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0294002Z [rank2]:E1204 13:04:15.570000 384980 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0294293Z [rank2]:E1204 13:04:15.570000 384980 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0294430Z [rank2]:E1204 13:04:15.570000 384980 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0294706Z [rank2]:E1204 13:04:15.570000 384980 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0294854Z [rank2]:E1204 13:04:15.570000 384980 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0295331Z [rank2]:E1204 13:04:15.570000 384980 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 2300575744 and is now 3812622336.
2025-12-04T13:38:32.0295449Z [rank2]:E1204 13:04:15.570000 384980 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0295655Z [rank2]:E1204 13:04:15.570000 384980 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0296014Z [rank2]:E1204 13:04:15.570000 384980 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.0296127Z [rank2]:E1204 13:04:15.570000 384980 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0296337Z [rank2]:E1204 13:04:15.570000 384980 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0296512Z [rank2]:E1204 13:04:15.570000 384980 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.0296553Z dist init r=2, world=4
2025-12-04T13:38:32.0296897Z [rank0]:[W1204 13:04:15.841703378 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.0296937Z FAILED [39.5600s] [100%]
2025-12-04T13:38:32.0296939Z 
2025-12-04T13:38:32.0297000Z =================================== FAILURES ===================================
2025-12-04T13:38:32.0297116Z _ TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda _
2025-12-04T13:38:32.0297164Z Traceback (most recent call last):
2025-12-04T13:38:32.0297330Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.0297375Z     self._join_processes(fn)
2025-12-04T13:38:32.0297554Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.0297608Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.0297790Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.0297834Z     raise RuntimeError(error)
2025-12-04T13:38:32.0297918Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.0297964Z Traceback (most recent call last):
2025-12-04T13:38:32.0298129Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0298183Z     getattr(self, test_name)()
2025-12-04T13:38:32.0298345Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0298380Z     fn()
2025-12-04T13:38:32.0298538Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0298580Z     method(*args, **kwargs)
2025-12-04T13:38:32.0298735Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0298776Z     method(*args, **kwargs)
2025-12-04T13:38:32.0298931Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0298969Z     with policy():
2025-12-04T13:38:32.0299127Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0299169Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0299525Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 2250244096 and is now 3762290688.
2025-12-04T13:38:32.0299538Z 
2025-12-04T13:38:32.0299817Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0300052Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.0300055Z 
2025-12-04T13:38:32.0300145Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0300148Z 
2025-12-04T13:38:32.0300150Z 
2025-12-04T13:38:32.0300226Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.0300316Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.0300564Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-159aa8b84e6f8c02.xml -
2025-12-04T13:38:32.0300631Z =========================== short test summary info ============================
2025-12-04T13:38:32.0300884Z FAILED [39.5600s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_shard_grad_op_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.0300934Z Traceback (most recent call last):
2025-12-04T13:38:32.0301101Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0301158Z     getattr(self, test_name)()
2025-12-04T13:38:32.0301323Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0301358Z     fn()
2025-12-04T13:38:32.0301515Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0301557Z     method(*args, **kwargs)
2025-12-04T13:38:32.0301713Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0301756Z     method(*args, **kwargs)
2025-12-04T13:38:32.0301910Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0301948Z     with policy():
2025-12-04T13:38:32.0302104Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0302146Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0302519Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 2250244096 and is now 3762290688.
2025-12-04T13:38:32.0302522Z 
2025-12-04T13:38:32.0302596Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0302836Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.0302839Z 
2025-12-04T13:38:32.0302928Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0302991Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.0303057Z ====================== 1 failed, 32 deselected in 39.72s =======================
2025-12-04T13:38:32.0303097Z Got exit code 1
2025-12-04T13:38:32.0303142Z Retrying single test...
2025-12-04T13:38:32.0303334Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2cc526eae4caa98d.xml
2025-12-04T13:38:32.0303397Z ============================= test session starts ==============================
2025-12-04T13:38:32.0303525Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.0303570Z cachedir: .pytest_cache
2025-12-04T13:38:32.0303795Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.0303845Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.0303886Z configfile: pytest.ini
2025-12-04T13:38:32.0304052Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.0304127Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.0304360Z stepcurrent: skipping 6 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.0304404Z Running 1 items in this shard
2025-12-04T13:38:32.0304426Z 
2025-12-04T13:38:32.0304746Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_shard_grad_op_cuda I1204 13:04:20.037000 385311 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 385380
2025-12-04T13:38:32.0304908Z I1204 13:04:20.037000 385311 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 385381
2025-12-04T13:38:32.0305059Z I1204 13:04:20.038000 385311 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 385382
2025-12-04T13:38:32.0305227Z I1204 13:04:20.038000 385311 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 385383
2025-12-04T13:38:32.0305811Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0305853Z   _warn_cpu_init()
2025-12-04T13:38:32.0306422Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0306479Z   _warn_cpu_init()
2025-12-04T13:38:32.0307052Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0307089Z   _warn_cpu_init()
2025-12-04T13:38:32.0307652Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0307691Z   _warn_cpu_init()
2025-12-04T13:38:32.0307998Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.0308045Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0308188Z [rank0]:E1204 13:04:57.482000 385380 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0308353Z [rank0]:E1204 13:04:57.482000 385380 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0308644Z [rank0]:E1204 13:04:57.482000 385380 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0308804Z [rank0]:E1204 13:04:57.482000 385380 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0309101Z [rank0]:E1204 13:04:57.482000 385380 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0309230Z [rank0]:E1204 13:04:57.482000 385380 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0309507Z [rank0]:E1204 13:04:57.482000 385380 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0309698Z [rank0]:E1204 13:04:57.482000 385380 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0309995Z [rank0]:E1204 13:04:57.482000 385380 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0310142Z [rank0]:E1204 13:04:57.482000 385380 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0310422Z [rank0]:E1204 13:04:57.482000 385380 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0310558Z [rank0]:E1204 13:04:57.482000 385380 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0310838Z [rank0]:E1204 13:04:57.482000 385380 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0310998Z [rank0]:E1204 13:04:57.482000 385380 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0311487Z [rank0]:E1204 13:04:57.482000 385380 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 2453667840 and is now 3965714432.
2025-12-04T13:38:32.0311609Z [rank0]:E1204 13:04:57.482000 385380 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0311803Z [rank0]:E1204 13:04:57.482000 385380 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0312172Z [rank0]:E1204 13:04:57.482000 385380 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.0312286Z [rank0]:E1204 13:04:57.482000 385380 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0312515Z [rank0]:E1204 13:04:57.482000 385380 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0312682Z [rank0]:E1204 13:04:57.482000 385380 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.0312724Z dist init r=0, world=4
2025-12-04T13:38:32.0312864Z [rank2]:E1204 13:04:57.484000 385382 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0313024Z [rank2]:E1204 13:04:57.484000 385382 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0313327Z [rank2]:E1204 13:04:57.484000 385382 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0313481Z [rank2]:E1204 13:04:57.484000 385382 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0313768Z [rank2]:E1204 13:04:57.484000 385382 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0313892Z [rank2]:E1204 13:04:57.484000 385382 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0314182Z [rank2]:E1204 13:04:57.484000 385382 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0314331Z [rank2]:E1204 13:04:57.484000 385382 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0314609Z [rank2]:E1204 13:04:57.484000 385382 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0314759Z [rank2]:E1204 13:04:57.484000 385382 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0315034Z [rank2]:E1204 13:04:57.484000 385382 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0315183Z [rank2]:E1204 13:04:57.484000 385382 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0315463Z [rank2]:E1204 13:04:57.484000 385382 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0315615Z [rank2]:E1204 13:04:57.484000 385382 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0316097Z [rank2]:E1204 13:04:57.484000 385382 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 2300575744 and is now 3812622336.
2025-12-04T13:38:32.0316217Z [rank2]:E1204 13:04:57.484000 385382 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0316416Z [rank2]:E1204 13:04:57.484000 385382 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0316790Z [rank2]:E1204 13:04:57.484000 385382 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.0316905Z [rank2]:E1204 13:04:57.484000 385382 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0317117Z [rank2]:E1204 13:04:57.484000 385382 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0317286Z [rank2]:E1204 13:04:57.484000 385382 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.0317326Z dist init r=2, world=4
2025-12-04T13:38:32.0317467Z [rank3]:E1204 13:04:57.522000 385383 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0317641Z [rank3]:E1204 13:04:57.522000 385383 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0317927Z [rank3]:E1204 13:04:57.522000 385383 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0318082Z [rank3]:E1204 13:04:57.522000 385383 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0318366Z [rank3]:E1204 13:04:57.522000 385383 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0318504Z [rank3]:E1204 13:04:57.522000 385383 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0318784Z [rank3]:E1204 13:04:57.522000 385383 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0318934Z [rank3]:E1204 13:04:57.522000 385383 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0319209Z [rank3]:E1204 13:04:57.522000 385383 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0319358Z [rank3]:E1204 13:04:57.522000 385383 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0319686Z [rank3]:E1204 13:04:57.522000 385383 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0319822Z [rank3]:E1204 13:04:57.522000 385383 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0320101Z [rank3]:E1204 13:04:57.522000 385383 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0320250Z [rank3]:E1204 13:04:57.522000 385383 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0320732Z [rank3]:E1204 13:04:57.522000 385383 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 2250244096 and is now 3762290688.
2025-12-04T13:38:32.0320851Z [rank3]:E1204 13:04:57.522000 385383 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0321065Z [rank3]:E1204 13:04:57.522000 385383 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0321433Z [rank3]:E1204 13:04:57.522000 385383 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.0321546Z [rank3]:E1204 13:04:57.522000 385383 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0321760Z [rank3]:E1204 13:04:57.522000 385383 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0321938Z [rank3]:E1204 13:04:57.522000 385383 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.0321984Z dist init r=3, world=4
2025-12-04T13:38:32.0322121Z [rank1]:E1204 13:04:57.536000 385381 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0322283Z [rank1]:E1204 13:04:57.536000 385381 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0322574Z [rank1]:E1204 13:04:57.536000 385381 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0322741Z [rank1]:E1204 13:04:57.536000 385381 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0323031Z [rank1]:E1204 13:04:57.536000 385381 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0323155Z [rank1]:E1204 13:04:57.536000 385381 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0323437Z [rank1]:E1204 13:04:57.536000 385381 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0323584Z [rank1]:E1204 13:04:57.536000 385381 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0323879Z [rank1]:E1204 13:04:57.536000 385381 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0324031Z [rank1]:E1204 13:04:57.536000 385381 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0324308Z [rank1]:E1204 13:04:57.536000 385381 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0324446Z [rank1]:E1204 13:04:57.536000 385381 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0324727Z [rank1]:E1204 13:04:57.536000 385381 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0324881Z [rank1]:E1204 13:04:57.536000 385381 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0325371Z [rank1]:E1204 13:04:57.536000 385381 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 2317352960 and is now 3829399552.
2025-12-04T13:38:32.0325489Z [rank1]:E1204 13:04:57.536000 385381 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0325687Z [rank1]:E1204 13:04:57.536000 385381 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0326049Z [rank1]:E1204 13:04:57.536000 385381 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.0326175Z [rank1]:E1204 13:04:57.536000 385381 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0326386Z [rank1]:E1204 13:04:57.536000 385381 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0326553Z [rank1]:E1204 13:04:57.536000 385381 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.0326592Z dist init r=1, world=4
2025-12-04T13:38:32.0326930Z [rank0]:[W1204 13:04:57.668643805 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.0326984Z FAILED [39.2612s] [100%]
2025-12-04T13:38:32.0326989Z 
2025-12-04T13:38:32.0327046Z =================================== FAILURES ===================================
2025-12-04T13:38:32.0327154Z _ TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda _
2025-12-04T13:38:32.0327202Z Traceback (most recent call last):
2025-12-04T13:38:32.0327369Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.0327412Z     self._join_processes(fn)
2025-12-04T13:38:32.0327589Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.0327645Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.0327830Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.0327886Z     raise RuntimeError(error)
2025-12-04T13:38:32.0327972Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.0328017Z Traceback (most recent call last):
2025-12-04T13:38:32.0328184Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0328227Z     getattr(self, test_name)()
2025-12-04T13:38:32.0328388Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0328423Z     fn()
2025-12-04T13:38:32.0328577Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0328619Z     method(*args, **kwargs)
2025-12-04T13:38:32.0328775Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0328817Z     method(*args, **kwargs)
2025-12-04T13:38:32.0328973Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0329010Z     with policy():
2025-12-04T13:38:32.0329177Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0329219Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0329619Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 2453667840 and is now 3965714432.
2025-12-04T13:38:32.0329621Z 
2025-12-04T13:38:32.0329701Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0329938Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.0329942Z 
2025-12-04T13:38:32.0330033Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0330035Z 
2025-12-04T13:38:32.0330051Z 
2025-12-04T13:38:32.0330127Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.0330218Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.0330451Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2cc526eae4caa98d.xml -
2025-12-04T13:38:32.0330515Z =========================== short test summary info ============================
2025-12-04T13:38:32.0330764Z FAILED [39.2612s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_shard_grad_op_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.0330826Z Traceback (most recent call last):
2025-12-04T13:38:32.0330994Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0331037Z     getattr(self, test_name)()
2025-12-04T13:38:32.0331202Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0331237Z     fn()
2025-12-04T13:38:32.0331393Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0331434Z     method(*args, **kwargs)
2025-12-04T13:38:32.0331588Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0331630Z     method(*args, **kwargs)
2025-12-04T13:38:32.0331800Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0331837Z     with policy():
2025-12-04T13:38:32.0331995Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0332037Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0332399Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 2453667840 and is now 3965714432.
2025-12-04T13:38:32.0332401Z 
2025-12-04T13:38:32.0332476Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0332714Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.0332718Z 
2025-12-04T13:38:32.0332808Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0332871Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.0332939Z ====================== 1 failed, 32 deselected in 39.43s =======================
2025-12-04T13:38:32.0332976Z Got exit code 1
2025-12-04T13:38:32.0333178Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.0333307Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T13:38:32.0333497Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-75e1e2e1e2b3ee4f.xml
2025-12-04T13:38:32.0333556Z ============================= test session starts ==============================
2025-12-04T13:38:32.0333674Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.0333716Z cachedir: .pytest_cache
2025-12-04T13:38:32.0333878Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.0333934Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.0333980Z configfile: pytest.ini
2025-12-04T13:38:32.0334145Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.0334222Z collecting ... collected 60 items / 7 deselected / 53 selected
2025-12-04T13:38:32.0334275Z stepcurrent: skipping 7 already run items.
2025-12-04T13:38:32.0334323Z Running 26 items in this shard
2025-12-04T13:38:32.0334325Z 
2025-12-04T13:38:32.0334641Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_no_shard_cuda I1204 13:05:01.959000 385713 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 385782
2025-12-04T13:38:32.0334811Z I1204 13:05:01.960000 385713 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 385783
2025-12-04T13:38:32.0334968Z I1204 13:05:01.960000 385713 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 385784
2025-12-04T13:38:32.0335120Z I1204 13:05:01.960000 385713 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 385785
2025-12-04T13:38:32.0335419Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0335471Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.0336048Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0336106Z   _warn_cpu_init()
2025-12-04T13:38:32.0336402Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0336486Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.0336772Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0336829Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.0337112Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0337164Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.0337749Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0337792Z   _warn_cpu_init()
2025-12-04T13:38:32.0338379Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0338419Z   _warn_cpu_init()
2025-12-04T13:38:32.0338712Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0338761Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.0339339Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0339390Z   _warn_cpu_init()
2025-12-04T13:38:32.0339722Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0339804Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.0340093Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0340173Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.0340474Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0340552Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.0341831Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.0341974Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.0342207Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.0342250Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0343521Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.0343652Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.0344910Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.0345045Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.0345285Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.0345328Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0345558Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.0345603Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0346867Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.0346992Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.0347226Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.0347268Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0347495Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.0347537Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0347770Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.0347813Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0348036Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.0348077Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0348299Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.0348350Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0348646Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.0348688Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0348836Z [rank1]:E1204 13:05:09.492000 385783 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0349003Z [rank1]:E1204 13:05:09.492000 385783 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0349295Z [rank1]:E1204 13:05:09.492000 385783 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0349455Z [rank1]:E1204 13:05:09.492000 385783 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0349794Z [rank1]:E1204 13:05:09.492000 385783 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0349925Z [rank1]:E1204 13:05:09.492000 385783 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0350206Z [rank1]:E1204 13:05:09.492000 385783 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0350360Z [rank1]:E1204 13:05:09.492000 385783 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0350638Z [rank1]:E1204 13:05:09.492000 385783 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0350793Z [rank1]:E1204 13:05:09.492000 385783 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0351090Z [rank1]:E1204 13:05:09.492000 385783 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0351229Z [rank1]:E1204 13:05:09.492000 385783 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0351512Z [rank1]:E1204 13:05:09.492000 385783 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0351661Z [rank1]:E1204 13:05:09.492000 385783 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0352172Z [rank1]:E1204 13:05:09.492000 385783 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 160256 on device 1. CUDA driver allocated memory was 2317352960 and is now 3875536896.
2025-12-04T13:38:32.0352292Z [rank1]:E1204 13:05:09.492000 385783 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0352488Z [rank1]:E1204 13:05:09.492000 385783 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0352855Z [rank1]:E1204 13:05:09.492000 385783 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda
2025-12-04T13:38:32.0352988Z [rank1]:E1204 13:05:09.492000 385783 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0353206Z [rank1]:E1204 13:05:09.492000 385783 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0353372Z [rank1]:E1204 13:05:09.492000 385783 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.0353415Z dist init r=1, world=4
2025-12-04T13:38:32.0353552Z [rank0]:E1204 13:05:09.539000 385782 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0353715Z [rank0]:E1204 13:05:09.539000 385782 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0354006Z [rank0]:E1204 13:05:09.539000 385782 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0354178Z [rank0]:E1204 13:05:09.539000 385782 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0354469Z [rank0]:E1204 13:05:09.539000 385782 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0354592Z [rank0]:E1204 13:05:09.539000 385782 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0354874Z [rank0]:E1204 13:05:09.539000 385782 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0355023Z [rank0]:E1204 13:05:09.539000 385782 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0355301Z [rank0]:E1204 13:05:09.539000 385782 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0355461Z [rank0]:E1204 13:05:09.539000 385782 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0355736Z [rank0]:E1204 13:05:09.539000 385782 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0355875Z [rank0]:E1204 13:05:09.539000 385782 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0356156Z [rank0]:E1204 13:05:09.539000 385782 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0356319Z [rank0]:E1204 13:05:09.539000 385782 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0356807Z [rank0]:E1204 13:05:09.539000 385782 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 162304 on device 0. CUDA driver allocated memory was 2453667840 and is now 4011851776.
2025-12-04T13:38:32.0356924Z [rank0]:E1204 13:05:09.539000 385782 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0370140Z [rank0]:E1204 13:05:09.539000 385782 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0370557Z [rank0]:E1204 13:05:09.539000 385782 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda
2025-12-04T13:38:32.0370686Z [rank0]:E1204 13:05:09.539000 385782 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0370909Z [rank0]:E1204 13:05:09.539000 385782 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0371085Z [rank0]:E1204 13:05:09.539000 385782 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.0371131Z dist init r=0, world=4
2025-12-04T13:38:32.0371280Z [rank2]:E1204 13:05:09.543000 385784 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0371509Z [rank2]:E1204 13:05:09.543000 385784 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0371813Z [rank2]:E1204 13:05:09.543000 385784 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0371973Z [rank2]:E1204 13:05:09.543000 385784 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0372267Z [rank2]:E1204 13:05:09.543000 385784 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0372398Z [rank2]:E1204 13:05:09.543000 385784 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0372680Z [rank2]:E1204 13:05:09.543000 385784 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0372850Z [rank2]:E1204 13:05:09.543000 385784 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0373127Z [rank2]:E1204 13:05:09.543000 385784 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0373279Z [rank2]:E1204 13:05:09.543000 385784 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0373558Z [rank2]:E1204 13:05:09.543000 385784 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0373699Z [rank2]:E1204 13:05:09.543000 385784 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0374001Z [rank2]:E1204 13:05:09.543000 385784 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0374150Z [rank2]:E1204 13:05:09.543000 385784 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0374645Z [rank2]:E1204 13:05:09.543000 385784 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 164352 on device 2. CUDA driver allocated memory was 2300575744 and is now 3858759680.
2025-12-04T13:38:32.0374780Z [rank2]:E1204 13:05:09.543000 385784 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0374983Z [rank2]:E1204 13:05:09.543000 385784 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0375353Z [rank2]:E1204 13:05:09.543000 385784 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda
2025-12-04T13:38:32.0375468Z [rank2]:E1204 13:05:09.543000 385784 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0375684Z [rank2]:E1204 13:05:09.543000 385784 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0375862Z [rank2]:E1204 13:05:09.543000 385784 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.0375907Z dist init r=2, world=4
2025-12-04T13:38:32.0376046Z [rank3]:E1204 13:05:09.546000 385785 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0376209Z [rank3]:E1204 13:05:09.546000 385785 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0376498Z [rank3]:E1204 13:05:09.546000 385785 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0376655Z [rank3]:E1204 13:05:09.546000 385785 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0376951Z [rank3]:E1204 13:05:09.546000 385785 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0377078Z [rank3]:E1204 13:05:09.546000 385785 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0377370Z [rank3]:E1204 13:05:09.546000 385785 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0377520Z [rank3]:E1204 13:05:09.546000 385785 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0377803Z [rank3]:E1204 13:05:09.546000 385785 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0377959Z [rank3]:E1204 13:05:09.546000 385785 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0378254Z [rank3]:E1204 13:05:09.546000 385785 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0378394Z [rank3]:E1204 13:05:09.546000 385785 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0378672Z [rank3]:E1204 13:05:09.546000 385785 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0378823Z [rank3]:E1204 13:05:09.546000 385785 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0379325Z [rank3]:E1204 13:05:09.546000 385785 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 156160 on device 3. CUDA driver allocated memory was 2250244096 and is now 3808428032.
2025-12-04T13:38:32.0379445Z [rank3]:E1204 13:05:09.546000 385785 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0379691Z [rank3]:E1204 13:05:09.546000 385785 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0380060Z [rank3]:E1204 13:05:09.546000 385785 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda
2025-12-04T13:38:32.0380195Z [rank3]:E1204 13:05:09.546000 385785 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0380407Z [rank3]:E1204 13:05:09.546000 385785 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0380575Z [rank3]:E1204 13:05:09.546000 385785 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.0380616Z dist init r=3, world=4
2025-12-04T13:38:32.0380962Z [rank0]:[W1204 13:05:09.790562524 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.0381003Z FAILED [9.5196s] [  3%]
2025-12-04T13:38:32.0381008Z 
2025-12-04T13:38:32.0381071Z =================================== FAILURES ===================================
2025-12-04T13:38:32.0381178Z _ TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda _
2025-12-04T13:38:32.0381230Z Traceback (most recent call last):
2025-12-04T13:38:32.0381399Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.0381462Z     self._join_processes(fn)
2025-12-04T13:38:32.0381637Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.0381696Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.0381878Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.0381925Z     raise RuntimeError(error)
2025-12-04T13:38:32.0382011Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T13:38:32.0382059Z Traceback (most recent call last):
2025-12-04T13:38:32.0382224Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0382268Z     getattr(self, test_name)()
2025-12-04T13:38:32.0382444Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0382480Z     fn()
2025-12-04T13:38:32.0382637Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0382679Z     method(*args, **kwargs)
2025-12-04T13:38:32.0382835Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0382876Z     method(*args, **kwargs)
2025-12-04T13:38:32.0383045Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0383084Z     with policy():
2025-12-04T13:38:32.0383242Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0383284Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0383649Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 160256 on device 1. CUDA driver allocated memory was 2317352960 and is now 3875536896.
2025-12-04T13:38:32.0383652Z 
2025-12-04T13:38:32.0383729Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0383973Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda
2025-12-04T13:38:32.0383977Z 
2025-12-04T13:38:32.0384078Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0384080Z 
2025-12-04T13:38:32.0384082Z 
2025-12-04T13:38:32.0384163Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.0384254Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.0384489Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-75e1e2e1e2b3ee4f.xml -
2025-12-04T13:38:32.0384553Z =========================== short test summary info ============================
2025-12-04T13:38:32.0384814Z FAILED [9.5196s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_no_shard_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T13:38:32.0384865Z Traceback (most recent call last):
2025-12-04T13:38:32.0385034Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0385080Z     getattr(self, test_name)()
2025-12-04T13:38:32.0385242Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0385283Z     fn()
2025-12-04T13:38:32.0406043Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0406102Z     method(*args, **kwargs)
2025-12-04T13:38:32.0406261Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0406301Z     method(*args, **kwargs)
2025-12-04T13:38:32.0406457Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0406496Z     with policy():
2025-12-04T13:38:32.0406650Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0406690Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0407076Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 160256 on device 1. CUDA driver allocated memory was 2317352960 and is now 3875536896.
2025-12-04T13:38:32.0407079Z 
2025-12-04T13:38:32.0407155Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0407393Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda
2025-12-04T13:38:32.0407396Z 
2025-12-04T13:38:32.0407482Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0407562Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.0407630Z ======================= 1 failed, 7 deselected in 9.66s ========================
2025-12-04T13:38:32.0407666Z Got exit code 1
2025-12-04T13:38:32.0407706Z Retrying single test...
2025-12-04T13:38:32.0407897Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-c2bb62c1be351938.xml
2025-12-04T13:38:32.0407957Z ============================= test session starts ==============================
2025-12-04T13:38:32.0408071Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.0408111Z cachedir: .pytest_cache
2025-12-04T13:38:32.0408269Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.0408316Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.0408356Z configfile: pytest.ini
2025-12-04T13:38:32.0408541Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.0408615Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.0408846Z stepcurrent: skipping 7 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_no_shard_cuda
2025-12-04T13:38:32.0408890Z Running 1 items in this shard
2025-12-04T13:38:32.0408892Z 
2025-12-04T13:38:32.0409206Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_no_shard_cuda I1204 13:05:14.133000 386115 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 386184
2025-12-04T13:38:32.0409360Z I1204 13:05:14.133000 386115 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 386185
2025-12-04T13:38:32.0409511Z I1204 13:05:14.134000 386115 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 386186
2025-12-04T13:38:32.0409705Z I1204 13:05:14.134000 386115 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 386187
2025-12-04T13:38:32.0410014Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0410066Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.0410643Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0410683Z   _warn_cpu_init()
2025-12-04T13:38:32.0410985Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0411035Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.0411600Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0411650Z   _warn_cpu_init()
2025-12-04T13:38:32.0411937Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0412017Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.0412305Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0412379Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.0412667Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0412714Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.0413287Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0413338Z   _warn_cpu_init()
2025-12-04T13:38:32.0413621Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0413668Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.0414234Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0414272Z   _warn_cpu_init()
2025-12-04T13:38:32.0414569Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0414642Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.0414927Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0414999Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.0416277Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.0416422Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.0416650Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.0416694Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0417947Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.0418082Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.0418309Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.0418350Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0419656Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.0419778Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.0420006Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.0420046Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0421304Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.0421436Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.0421662Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.0421702Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0421922Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.0421962Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0422181Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.0422235Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0422457Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.0422497Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0422716Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.0422756Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0423051Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.0423092Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0423239Z [rank3]:E1204 13:05:21.719000 386187 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0423403Z [rank3]:E1204 13:05:21.719000 386187 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0423715Z [rank3]:E1204 13:05:21.719000 386187 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0423871Z [rank3]:E1204 13:05:21.719000 386187 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0424155Z [rank3]:E1204 13:05:21.719000 386187 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0424280Z [rank3]:E1204 13:05:21.719000 386187 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0424567Z [rank3]:E1204 13:05:21.719000 386187 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0424718Z [rank3]:E1204 13:05:21.719000 386187 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0424998Z [rank3]:E1204 13:05:21.719000 386187 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0425146Z [rank3]:E1204 13:05:21.719000 386187 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0425436Z [rank3]:E1204 13:05:21.719000 386187 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0425576Z [rank3]:E1204 13:05:21.719000 386187 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0425858Z [rank3]:E1204 13:05:21.719000 386187 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0426009Z [rank3]:E1204 13:05:21.719000 386187 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0426499Z [rank3]:E1204 13:05:21.719000 386187 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 164352 on device 3. CUDA driver allocated memory was 2250244096 and is now 3808428032.
2025-12-04T13:38:32.0426629Z [rank3]:E1204 13:05:21.719000 386187 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0426829Z [rank3]:E1204 13:05:21.719000 386187 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0427194Z [rank3]:E1204 13:05:21.719000 386187 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda
2025-12-04T13:38:32.0427311Z [rank3]:E1204 13:05:21.719000 386187 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0427525Z [rank3]:E1204 13:05:21.719000 386187 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0427699Z [rank3]:E1204 13:05:21.719000 386187 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.0427738Z dist init r=3, world=4
2025-12-04T13:38:32.0427890Z [rank0]:E1204 13:05:21.771000 386184 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0428051Z [rank0]:E1204 13:05:21.771000 386184 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0428342Z [rank0]:E1204 13:05:21.771000 386184 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0428497Z [rank0]:E1204 13:05:21.771000 386184 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0428783Z [rank0]:E1204 13:05:21.771000 386184 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0428919Z [rank0]:E1204 13:05:21.771000 386184 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0429195Z [rank0]:E1204 13:05:21.771000 386184 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0429345Z [rank0]:E1204 13:05:21.771000 386184 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0429654Z [rank0]:E1204 13:05:21.771000 386184 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0429820Z [rank0]:E1204 13:05:21.771000 386184 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0430097Z [rank0]:E1204 13:05:21.771000 386184 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0430233Z [rank0]:E1204 13:05:21.771000 386184 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0430514Z [rank0]:E1204 13:05:21.771000 386184 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0430661Z [rank0]:E1204 13:05:21.771000 386184 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0431164Z [rank0]:E1204 13:05:21.771000 386184 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 152064 on device 0. CUDA driver allocated memory was 2453667840 and is now 4011851776.
2025-12-04T13:38:32.0431277Z [rank0]:E1204 13:05:21.771000 386184 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0431477Z [rank0]:E1204 13:05:21.771000 386184 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0431842Z [rank0]:E1204 13:05:21.771000 386184 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda
2025-12-04T13:38:32.0431956Z [rank0]:E1204 13:05:21.771000 386184 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0432177Z [rank0]:E1204 13:05:21.771000 386184 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0432355Z [rank0]:E1204 13:05:21.771000 386184 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.0432396Z dist init r=0, world=4
2025-12-04T13:38:32.0432533Z [rank1]:E1204 13:05:21.794000 386185 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0432694Z [rank1]:E1204 13:05:21.794000 386185 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0432984Z [rank1]:E1204 13:05:21.794000 386185 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0433154Z [rank1]:E1204 13:05:21.794000 386185 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0433441Z [rank1]:E1204 13:05:21.794000 386185 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0433564Z [rank1]:E1204 13:05:21.794000 386185 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0433842Z [rank1]:E1204 13:05:21.794000 386185 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0434001Z [rank1]:E1204 13:05:21.794000 386185 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0434278Z [rank1]:E1204 13:05:21.794000 386185 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0434426Z [rank1]:E1204 13:05:21.794000 386185 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0434703Z [rank1]:E1204 13:05:21.794000 386185 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0434841Z [rank1]:E1204 13:05:21.794000 386185 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0435124Z [rank1]:E1204 13:05:21.794000 386185 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0435287Z [rank1]:E1204 13:05:21.794000 386185 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0435771Z [rank1]:E1204 13:05:21.794000 386185 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 156160 on device 1. CUDA driver allocated memory was 2317352960 and is now 3875536896.
2025-12-04T13:38:32.0435887Z [rank1]:E1204 13:05:21.794000 386185 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0436082Z [rank1]:E1204 13:05:21.794000 386185 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0436447Z [rank1]:E1204 13:05:21.794000 386185 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda
2025-12-04T13:38:32.0436572Z [rank1]:E1204 13:05:21.794000 386185 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0436782Z [rank1]:E1204 13:05:21.794000 386185 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0436947Z [rank1]:E1204 13:05:21.794000 386185 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.0436986Z dist init r=1, world=4
2025-12-04T13:38:32.0437125Z [rank2]:E1204 13:05:21.802000 386186 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0437285Z [rank2]:E1204 13:05:21.802000 386186 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0437591Z [rank2]:E1204 13:05:21.802000 386186 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0437744Z [rank2]:E1204 13:05:21.802000 386186 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0438030Z [rank2]:E1204 13:05:21.802000 386186 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0438166Z [rank2]:E1204 13:05:21.802000 386186 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0438445Z [rank2]:E1204 13:05:21.802000 386186 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0438594Z [rank2]:E1204 13:05:21.802000 386186 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0438872Z [rank2]:E1204 13:05:21.802000 386186 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0439021Z [rank2]:E1204 13:05:21.802000 386186 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0439295Z [rank2]:E1204 13:05:21.802000 386186 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0439450Z [rank2]:E1204 13:05:21.802000 386186 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0439766Z [rank2]:E1204 13:05:21.802000 386186 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0439915Z [rank2]:E1204 13:05:21.802000 386186 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0440407Z [rank2]:E1204 13:05:21.802000 386186 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 156160 on device 2. CUDA driver allocated memory was 2300575744 and is now 3858759680.
2025-12-04T13:38:32.0440523Z [rank2]:E1204 13:05:21.802000 386186 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0440721Z [rank2]:E1204 13:05:21.802000 386186 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0441098Z [rank2]:E1204 13:05:21.802000 386186 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda
2025-12-04T13:38:32.0441212Z [rank2]:E1204 13:05:21.802000 386186 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0441423Z [rank2]:E1204 13:05:21.802000 386186 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0441590Z [rank2]:E1204 13:05:21.802000 386186 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.0441631Z dist init r=2, world=4
2025-12-04T13:38:32.0441983Z [rank0]:[W1204 13:05:22.019907292 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.0442026Z FAILED [9.5210s] [100%]
2025-12-04T13:38:32.0442029Z 
2025-12-04T13:38:32.0442087Z =================================== FAILURES ===================================
2025-12-04T13:38:32.0442196Z _ TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda _
2025-12-04T13:38:32.0442245Z Traceback (most recent call last):
2025-12-04T13:38:32.0442425Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.0442470Z     self._join_processes(fn)
2025-12-04T13:38:32.0442646Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.0442702Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.0442883Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.0442929Z     raise RuntimeError(error)
2025-12-04T13:38:32.0443010Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.0443058Z Traceback (most recent call last):
2025-12-04T13:38:32.0443220Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0443268Z     getattr(self, test_name)()
2025-12-04T13:38:32.0443439Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0443477Z     fn()
2025-12-04T13:38:32.0443630Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0443675Z     method(*args, **kwargs)
2025-12-04T13:38:32.0443828Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0443872Z     method(*args, **kwargs)
2025-12-04T13:38:32.0444024Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0444064Z     with policy():
2025-12-04T13:38:32.0444217Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0444262Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0444621Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 164352 on device 3. CUDA driver allocated memory was 2250244096 and is now 3808428032.
2025-12-04T13:38:32.0444625Z 
2025-12-04T13:38:32.0444703Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0444953Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda
2025-12-04T13:38:32.0444958Z 
2025-12-04T13:38:32.0445046Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0445048Z 
2025-12-04T13:38:32.0445050Z 
2025-12-04T13:38:32.0445133Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.0445222Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.0445461Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-c2bb62c1be351938.xml -
2025-12-04T13:38:32.0445521Z =========================== short test summary info ============================
2025-12-04T13:38:32.0445789Z FAILED [9.5210s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_no_shard_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.0445835Z Traceback (most recent call last):
2025-12-04T13:38:32.0446002Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0446043Z     getattr(self, test_name)()
2025-12-04T13:38:32.0446207Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0446253Z     fn()
2025-12-04T13:38:32.0446407Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0446446Z     method(*args, **kwargs)
2025-12-04T13:38:32.0446601Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0446643Z     method(*args, **kwargs)
2025-12-04T13:38:32.0446797Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0446836Z     with policy():
2025-12-04T13:38:32.0446989Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0447033Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0447402Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 164352 on device 3. CUDA driver allocated memory was 2250244096 and is now 3808428032.
2025-12-04T13:38:32.0447416Z 
2025-12-04T13:38:32.0447493Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0447732Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda
2025-12-04T13:38:32.0447734Z 
2025-12-04T13:38:32.0447823Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0447886Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.0447951Z ======================= 1 failed, 32 deselected in 9.67s =======================
2025-12-04T13:38:32.0447989Z Got exit code 1
2025-12-04T13:38:32.0448033Z Retrying single test...
2025-12-04T13:38:32.0448223Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0435229ba06dfe2d.xml
2025-12-04T13:38:32.0448285Z ============================= test session starts ==============================
2025-12-04T13:38:32.0448401Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.0448443Z cachedir: .pytest_cache
2025-12-04T13:38:32.0448615Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.0448663Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.0448705Z configfile: pytest.ini
2025-12-04T13:38:32.0448870Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.0448947Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.0449173Z stepcurrent: skipping 7 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_no_shard_cuda
2025-12-04T13:38:32.0449222Z Running 1 items in this shard
2025-12-04T13:38:32.0449224Z 
2025-12-04T13:38:32.0449554Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_no_shard_cuda I1204 13:05:26.331000 386517 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 386586
2025-12-04T13:38:32.0449761Z I1204 13:05:26.332000 386517 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 386587
2025-12-04T13:38:32.0449914Z I1204 13:05:26.333000 386517 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 386588
2025-12-04T13:38:32.0450066Z I1204 13:05:26.333000 386517 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 386589
2025-12-04T13:38:32.0450377Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0450429Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.0450721Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0450771Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.0451358Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0451411Z   _warn_cpu_init()
2025-12-04T13:38:32.0451984Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0452024Z   _warn_cpu_init()
2025-12-04T13:38:32.0452314Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0452394Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.0452682Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0452760Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.0453059Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0453112Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.0453681Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0453722Z   _warn_cpu_init()
2025-12-04T13:38:32.0454025Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0454074Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.0454648Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0454704Z   _warn_cpu_init()
2025-12-04T13:38:32.0454995Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0455071Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.0455362Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0455440Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.0456707Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.0456864Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.0457096Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.0457142Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0458411Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.0458537Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.0458772Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.0458820Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0460116Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.0460253Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.0460480Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.0460522Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0461771Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.0461909Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.0462143Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.0462187Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0462423Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.0462467Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0462688Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.0462731Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0462953Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.0462994Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0463216Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.0463271Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0463567Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.0463607Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0463753Z [rank3]:E1204 13:05:33.796000 386589 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0463915Z [rank3]:E1204 13:05:33.796000 386589 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0464222Z [rank3]:E1204 13:05:33.796000 386589 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0464380Z [rank3]:E1204 13:05:33.796000 386589 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0464669Z [rank3]:E1204 13:05:33.796000 386589 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0464796Z [rank3]:E1204 13:05:33.796000 386589 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0465073Z [rank3]:E1204 13:05:33.796000 386589 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0465239Z [rank3]:E1204 13:05:33.796000 386589 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0465516Z [rank3]:E1204 13:05:33.796000 386589 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0465666Z [rank3]:E1204 13:05:33.796000 386589 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0465941Z [rank3]:E1204 13:05:33.796000 386589 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0466081Z [rank3]:E1204 13:05:33.796000 386589 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0466369Z [rank3]:E1204 13:05:33.796000 386589 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0466521Z [rank3]:E1204 13:05:33.796000 386589 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0467022Z [rank3]:E1204 13:05:33.796000 386589 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 3. CUDA driver allocated memory was 2250244096 and is now 3808428032.
2025-12-04T13:38:32.0467138Z [rank3]:E1204 13:05:33.796000 386589 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0467340Z [rank3]:E1204 13:05:33.796000 386589 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0467722Z [rank3]:E1204 13:05:33.796000 386589 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda
2025-12-04T13:38:32.0467838Z [rank3]:E1204 13:05:33.796000 386589 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0468052Z [rank3]:E1204 13:05:33.796000 386589 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0468217Z [rank3]:E1204 13:05:33.796000 386589 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.0468270Z dist init r=3, world=4
2025-12-04T13:38:32.0468410Z [rank0]:E1204 13:05:33.846000 386586 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0468571Z [rank0]:E1204 13:05:33.846000 386586 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0468861Z [rank0]:E1204 13:05:33.846000 386586 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0469017Z [rank0]:E1204 13:05:33.846000 386586 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0469302Z [rank0]:E1204 13:05:33.846000 386586 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0469438Z [rank0]:E1204 13:05:33.846000 386586 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0469762Z [rank0]:E1204 13:05:33.846000 386586 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0469913Z [rank0]:E1204 13:05:33.846000 386586 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0470192Z [rank0]:E1204 13:05:33.846000 386586 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0470337Z [rank0]:E1204 13:05:33.846000 386586 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0470617Z [rank0]:E1204 13:05:33.846000 386586 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0470754Z [rank0]:E1204 13:05:33.846000 386586 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0471052Z [rank0]:E1204 13:05:33.846000 386586 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0471203Z [rank0]:E1204 13:05:33.846000 386586 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0471688Z [rank0]:E1204 13:05:33.846000 386586 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 156160 on device 0. CUDA driver allocated memory was 2453667840 and is now 4011851776.
2025-12-04T13:38:32.0471813Z [rank0]:E1204 13:05:33.846000 386586 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0472029Z [rank0]:E1204 13:05:33.846000 386586 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0472395Z [rank0]:E1204 13:05:33.846000 386586 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda
2025-12-04T13:38:32.0472511Z [rank0]:E1204 13:05:33.846000 386586 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0472725Z [rank0]:E1204 13:05:33.846000 386586 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0472907Z [rank0]:E1204 13:05:33.846000 386586 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.0472946Z dist init r=0, world=4
2025-12-04T13:38:32.0473088Z [rank2]:E1204 13:05:33.848000 386588 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0473247Z [rank2]:E1204 13:05:33.848000 386588 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0473537Z [rank2]:E1204 13:05:33.848000 386588 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0473691Z [rank2]:E1204 13:05:33.848000 386588 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0473994Z [rank2]:E1204 13:05:33.848000 386588 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0474123Z [rank2]:E1204 13:05:33.848000 386588 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0474401Z [rank2]:E1204 13:05:33.848000 386588 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0474552Z [rank2]:E1204 13:05:33.848000 386588 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0474830Z [rank2]:E1204 13:05:33.848000 386588 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0474983Z [rank2]:E1204 13:05:33.848000 386588 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0475271Z [rank2]:E1204 13:05:33.848000 386588 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0475410Z [rank2]:E1204 13:05:33.848000 386588 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0475687Z [rank2]:E1204 13:05:33.848000 386588 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0475839Z [rank2]:E1204 13:05:33.848000 386588 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0476338Z [rank2]:E1204 13:05:33.848000 386588 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 156160 on device 2. CUDA driver allocated memory was 2300575744 and is now 3858759680.
2025-12-04T13:38:32.0476452Z [rank2]:E1204 13:05:33.848000 386588 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0476652Z [rank2]:E1204 13:05:33.848000 386588 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0477013Z [rank2]:E1204 13:05:33.848000 386588 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda
2025-12-04T13:38:32.0477140Z [rank2]:E1204 13:05:33.848000 386588 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0477353Z [rank2]:E1204 13:05:33.848000 386588 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0477518Z [rank2]:E1204 13:05:33.848000 386588 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.0477559Z dist init r=2, world=4
2025-12-04T13:38:32.0477696Z [rank1]:E1204 13:05:33.859000 386587 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0477858Z [rank1]:E1204 13:05:33.859000 386587 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0478146Z [rank1]:E1204 13:05:33.859000 386587 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0478315Z [rank1]:E1204 13:05:33.859000 386587 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0478602Z [rank1]:E1204 13:05:33.859000 386587 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0478728Z [rank1]:E1204 13:05:33.859000 386587 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0479009Z [rank1]:E1204 13:05:33.859000 386587 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0479157Z [rank1]:E1204 13:05:33.859000 386587 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0479436Z [rank1]:E1204 13:05:33.859000 386587 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0479629Z [rank1]:E1204 13:05:33.859000 386587 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0479911Z [rank1]:E1204 13:05:33.859000 386587 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0480048Z [rank1]:E1204 13:05:33.859000 386587 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0480330Z [rank1]:E1204 13:05:33.859000 386587 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0480495Z [rank1]:E1204 13:05:33.859000 386587 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0480976Z [rank1]:E1204 13:05:33.859000 386587 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 156160 on device 1. CUDA driver allocated memory was 2317352960 and is now 3875536896.
2025-12-04T13:38:32.0481093Z [rank1]:E1204 13:05:33.859000 386587 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0481301Z [rank1]:E1204 13:05:33.859000 386587 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0481667Z [rank1]:E1204 13:05:33.859000 386587 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda
2025-12-04T13:38:32.0481781Z [rank1]:E1204 13:05:33.859000 386587 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0481995Z [rank1]:E1204 13:05:33.859000 386587 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0482163Z [rank1]:E1204 13:05:33.859000 386587 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.0482203Z dist init r=1, world=4
2025-12-04T13:38:32.0482543Z [rank0]:[W1204 13:05:34.113669637 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.0482596Z FAILED [9.4192s] [100%]
2025-12-04T13:38:32.0482599Z 
2025-12-04T13:38:32.0482660Z =================================== FAILURES ===================================
2025-12-04T13:38:32.0482765Z _ TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda _
2025-12-04T13:38:32.0482816Z Traceback (most recent call last):
2025-12-04T13:38:32.0482979Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.0483028Z     self._join_processes(fn)
2025-12-04T13:38:32.0483201Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.0483261Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.0483440Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.0483488Z     raise RuntimeError(error)
2025-12-04T13:38:32.0483569Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.0483618Z Traceback (most recent call last):
2025-12-04T13:38:32.0483792Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0483839Z     getattr(self, test_name)()
2025-12-04T13:38:32.0483999Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0484037Z     fn()
2025-12-04T13:38:32.0484195Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0484238Z     method(*args, **kwargs)
2025-12-04T13:38:32.0484395Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0484438Z     method(*args, **kwargs)
2025-12-04T13:38:32.0484602Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0484642Z     with policy():
2025-12-04T13:38:32.0484798Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0484840Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0485203Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 3. CUDA driver allocated memory was 2250244096 and is now 3808428032.
2025-12-04T13:38:32.0485217Z 
2025-12-04T13:38:32.0485293Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0485533Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda
2025-12-04T13:38:32.0485536Z 
2025-12-04T13:38:32.0485624Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0485630Z 
2025-12-04T13:38:32.0485632Z 
2025-12-04T13:38:32.0485708Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.0485801Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.0486037Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0435229ba06dfe2d.xml -
2025-12-04T13:38:32.0486104Z =========================== short test summary info ============================
2025-12-04T13:38:32.0486371Z FAILED [9.4192s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_no_shard_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.0486422Z Traceback (most recent call last):
2025-12-04T13:38:32.0486586Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0486633Z     getattr(self, test_name)()
2025-12-04T13:38:32.0486794Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0486832Z     fn()
2025-12-04T13:38:32.0486984Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0487028Z     method(*args, **kwargs)
2025-12-04T13:38:32.0487179Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0487224Z     method(*args, **kwargs)
2025-12-04T13:38:32.0487375Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0487416Z     with policy():
2025-12-04T13:38:32.0487598Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0487645Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0488011Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 3. CUDA driver allocated memory was 2250244096 and is now 3808428032.
2025-12-04T13:38:32.0488013Z 
2025-12-04T13:38:32.0488089Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0488328Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_no_shard_cuda
2025-12-04T13:38:32.0488331Z 
2025-12-04T13:38:32.0488417Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0488495Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.0488559Z ======================= 1 failed, 32 deselected in 9.58s =======================
2025-12-04T13:38:32.0488600Z Got exit code 1
2025-12-04T13:38:32.0488786Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_no_shard_cuda
2025-12-04T13:38:32.0488917Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T13:38:32.0489104Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3a97c4aa2fa32d8f.xml
2025-12-04T13:38:32.0489182Z ============================= test session starts ==============================
2025-12-04T13:38:32.0489299Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.0489342Z cachedir: .pytest_cache
2025-12-04T13:38:32.0489506Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.0489554Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.0489629Z configfile: pytest.ini
2025-12-04T13:38:32.0489794Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.0489871Z collecting ... collected 60 items / 8 deselected / 52 selected
2025-12-04T13:38:32.0489925Z stepcurrent: skipping 8 already run items.
2025-12-04T13:38:32.0489973Z Running 25 items in this shard
2025-12-04T13:38:32.0489976Z 
2025-12-04T13:38:32.0490302Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_none_cuda I1204 13:05:38.381000 386919 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 386988
2025-12-04T13:38:32.0490464Z I1204 13:05:38.382000 386919 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 386989
2025-12-04T13:38:32.0490616Z I1204 13:05:38.382000 386919 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 386990
2025-12-04T13:38:32.0490769Z I1204 13:05:38.383000 386919 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 386991
2025-12-04T13:38:32.0491346Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0491390Z   _warn_cpu_init()
2025-12-04T13:38:32.0491971Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0492010Z   _warn_cpu_init()
2025-12-04T13:38:32.0492577Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0492619Z   _warn_cpu_init()
2025-12-04T13:38:32.0493203Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0493244Z   _warn_cpu_init()
2025-12-04T13:38:32.0493537Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.0493604Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0493747Z [rank0]:E1204 13:06:34.001000 386988 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0493912Z [rank0]:E1204 13:06:34.001000 386988 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0494200Z [rank0]:E1204 13:06:34.001000 386988 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0494359Z [rank0]:E1204 13:06:34.001000 386988 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0494646Z [rank0]:E1204 13:06:34.001000 386988 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0494783Z [rank0]:E1204 13:06:34.001000 386988 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0495065Z [rank0]:E1204 13:06:34.001000 386988 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0495215Z [rank0]:E1204 13:06:34.001000 386988 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0495497Z [rank0]:E1204 13:06:34.001000 386988 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0495646Z [rank0]:E1204 13:06:34.001000 386988 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0495928Z [rank0]:E1204 13:06:34.001000 386988 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0496079Z [rank0]:E1204 13:06:34.001000 386988 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0496357Z [rank0]:E1204 13:06:34.001000 386988 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0496509Z [rank0]:E1204 13:06:34.001000 386988 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0496987Z [rank0]:E1204 13:06:34.001000 386988 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 2453667840 and is now 3988783104.
2025-12-04T13:38:32.0497117Z [rank0]:E1204 13:06:34.001000 386988 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0497315Z [rank0]:E1204 13:06:34.001000 386988 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0497677Z [rank0]:E1204 13:06:34.001000 386988 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda
2025-12-04T13:38:32.0497807Z [rank0]:E1204 13:06:34.001000 386988 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0498021Z [rank0]:E1204 13:06:34.001000 386988 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0498191Z [rank0]:E1204 13:06:34.001000 386988 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.0498232Z dist init r=0, world=4
2025-12-04T13:38:32.0498372Z [rank1]:E1204 13:06:34.018000 386989 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0498535Z [rank1]:E1204 13:06:34.018000 386989 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0498825Z [rank1]:E1204 13:06:34.018000 386989 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0498991Z [rank1]:E1204 13:06:34.018000 386989 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0499281Z [rank1]:E1204 13:06:34.018000 386989 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0499410Z [rank1]:E1204 13:06:34.018000 386989 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0499827Z [rank1]:E1204 13:06:34.018000 386989 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0499980Z [rank1]:E1204 13:06:34.018000 386989 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0500257Z [rank1]:E1204 13:06:34.018000 386989 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0500407Z [rank1]:E1204 13:06:34.018000 386989 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0500700Z [rank1]:E1204 13:06:34.018000 386989 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0500843Z [rank1]:E1204 13:06:34.018000 386989 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0501125Z [rank1]:E1204 13:06:34.018000 386989 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0501278Z [rank1]:E1204 13:06:34.018000 386989 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0501780Z [rank1]:E1204 13:06:34.018000 386989 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 1. CUDA driver allocated memory was 2317352960 and is now 3852468224.
2025-12-04T13:38:32.0501895Z [rank1]:E1204 13:06:34.018000 386989 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0502093Z [rank1]:E1204 13:06:34.018000 386989 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0502464Z [rank1]:E1204 13:06:34.018000 386989 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda
2025-12-04T13:38:32.0502582Z [rank1]:E1204 13:06:34.018000 386989 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0502798Z [rank1]:E1204 13:06:34.018000 386989 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0502964Z [rank1]:E1204 13:06:34.018000 386989 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.0503008Z dist init r=1, world=4
2025-12-04T13:38:32.0503145Z [rank3]:E1204 13:06:34.025000 386991 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0503308Z [rank3]:E1204 13:06:34.025000 386991 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0503617Z [rank3]:E1204 13:06:34.025000 386991 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0503776Z [rank3]:E1204 13:06:34.025000 386991 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0504061Z [rank3]:E1204 13:06:34.025000 386991 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0504188Z [rank3]:E1204 13:06:34.025000 386991 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0504467Z [rank3]:E1204 13:06:34.025000 386991 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0504615Z [rank3]:E1204 13:06:34.025000 386991 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0504906Z [rank3]:E1204 13:06:34.025000 386991 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0505055Z [rank3]:E1204 13:06:34.025000 386991 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0505337Z [rank3]:E1204 13:06:34.025000 386991 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0505474Z [rank3]:E1204 13:06:34.025000 386991 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0505764Z [rank3]:E1204 13:06:34.025000 386991 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0505917Z [rank3]:E1204 13:06:34.025000 386991 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0506395Z [rank3]:E1204 13:06:34.025000 386991 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 2250244096 and is now 3785359360.
2025-12-04T13:38:32.0506523Z [rank3]:E1204 13:06:34.025000 386991 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0506718Z [rank3]:E1204 13:06:34.025000 386991 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0507079Z [rank3]:E1204 13:06:34.025000 386991 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda
2025-12-04T13:38:32.0507194Z [rank3]:E1204 13:06:34.025000 386991 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0507406Z [rank3]:E1204 13:06:34.025000 386991 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0507576Z [rank3]:E1204 13:06:34.025000 386991 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.0507627Z dist init r=3, world=4
2025-12-04T13:38:32.0507769Z [rank2]:E1204 13:06:34.101000 386990 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0507929Z [rank2]:E1204 13:06:34.101000 386990 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0508220Z [rank2]:E1204 13:06:34.101000 386990 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0508376Z [rank2]:E1204 13:06:34.101000 386990 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0508667Z [rank2]:E1204 13:06:34.101000 386990 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0508793Z [rank2]:E1204 13:06:34.101000 386990 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0509086Z [rank2]:E1204 13:06:34.101000 386990 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0509237Z [rank2]:E1204 13:06:34.101000 386990 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0509511Z [rank2]:E1204 13:06:34.101000 386990 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0509714Z [rank2]:E1204 13:06:34.101000 386990 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0509995Z [rank2]:E1204 13:06:34.101000 386990 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0510149Z [rank2]:E1204 13:06:34.101000 386990 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0510427Z [rank2]:E1204 13:06:34.101000 386990 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0510580Z [rank2]:E1204 13:06:34.101000 386990 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0511060Z [rank2]:E1204 13:06:34.101000 386990 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 2. CUDA driver allocated memory was 2300575744 and is now 3835691008.
2025-12-04T13:38:32.0514162Z [rank2]:E1204 13:06:34.101000 386990 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0514364Z [rank2]:E1204 13:06:34.101000 386990 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0514722Z [rank2]:E1204 13:06:34.101000 386990 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda
2025-12-04T13:38:32.0514840Z [rank2]:E1204 13:06:34.101000 386990 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0515082Z [rank2]:E1204 13:06:34.101000 386990 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0515248Z [rank2]:E1204 13:06:34.101000 386990 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.0515290Z dist init r=2, world=4
2025-12-04T13:38:32.0515634Z [rank0]:[W1204 13:06:34.164417214 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.0515681Z FAILED [57.5829s] [  4%]
2025-12-04T13:38:32.0515683Z 
2025-12-04T13:38:32.0515740Z =================================== FAILURES ===================================
2025-12-04T13:38:32.0515843Z __ TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda ___
2025-12-04T13:38:32.0515893Z Traceback (most recent call last):
2025-12-04T13:38:32.0516063Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.0516109Z     self._join_processes(fn)
2025-12-04T13:38:32.0516289Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.0516357Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.0516542Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.0516588Z     raise RuntimeError(error)
2025-12-04T13:38:32.0516670Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.0516716Z Traceback (most recent call last):
2025-12-04T13:38:32.0516880Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0516927Z     getattr(self, test_name)()
2025-12-04T13:38:32.0517087Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0517126Z     fn()
2025-12-04T13:38:32.0517290Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0517335Z     method(*args, **kwargs)
2025-12-04T13:38:32.0517488Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0517533Z     method(*args, **kwargs)
2025-12-04T13:38:32.0517685Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0517727Z     with policy():
2025-12-04T13:38:32.0517891Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0517938Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0518291Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 2453667840 and is now 3988783104.
2025-12-04T13:38:32.0518295Z 
2025-12-04T13:38:32.0518376Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0518611Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda
2025-12-04T13:38:32.0518613Z 
2025-12-04T13:38:32.0518702Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0518705Z 
2025-12-04T13:38:32.0518768Z Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.0518826Z Traceback (most recent call last):
2025-12-04T13:38:32.0518993Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0519035Z     getattr(self, test_name)()
2025-12-04T13:38:32.0519200Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0519238Z     fn()
2025-12-04T13:38:32.0519393Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0519433Z     method(*args, **kwargs)
2025-12-04T13:38:32.0519629Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0519670Z     method(*args, **kwargs)
2025-12-04T13:38:32.0519824Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0519865Z     with policy():
2025-12-04T13:38:32.0520022Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0520063Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0520439Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 2250244096 and is now 3785359360.
2025-12-04T13:38:32.0520441Z 
2025-12-04T13:38:32.0520518Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0520753Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda
2025-12-04T13:38:32.0520756Z 
2025-12-04T13:38:32.0520847Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0520850Z 
2025-12-04T13:38:32.0520852Z 
2025-12-04T13:38:32.0520928Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.0521019Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.0521270Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3a97c4aa2fa32d8f.xml -
2025-12-04T13:38:32.0521337Z =========================== short test summary info ============================
2025-12-04T13:38:32.0521594Z FAILED [57.5829s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_none_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.0521644Z Traceback (most recent call last):
2025-12-04T13:38:32.0521831Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0521878Z     getattr(self, test_name)()
2025-12-04T13:38:32.0522038Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0522077Z     fn()
2025-12-04T13:38:32.0522231Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0522275Z     method(*args, **kwargs)
2025-12-04T13:38:32.0522427Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0522472Z     method(*args, **kwargs)
2025-12-04T13:38:32.0522626Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0522664Z     with policy():
2025-12-04T13:38:32.0522821Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0522892Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0523247Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 2453667840 and is now 3988783104.
2025-12-04T13:38:32.0523250Z 
2025-12-04T13:38:32.0523323Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0523556Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda
2025-12-04T13:38:32.0523559Z 
2025-12-04T13:38:32.0523645Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0523647Z 
2025-12-04T13:38:32.0523709Z Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.0523755Z Traceback (most recent call last):
2025-12-04T13:38:32.0523921Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0523963Z     getattr(self, test_name)()
2025-12-04T13:38:32.0524127Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0524177Z     fn()
2025-12-04T13:38:32.0524330Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0524374Z     method(*args, **kwargs)
2025-12-04T13:38:32.0524524Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0524568Z     method(*args, **kwargs)
2025-12-04T13:38:32.0524719Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0524761Z     with policy():
2025-12-04T13:38:32.0524912Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0524957Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0525315Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 2250244096 and is now 3785359360.
2025-12-04T13:38:32.0525317Z 
2025-12-04T13:38:32.0525394Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0525625Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda
2025-12-04T13:38:32.0525640Z 
2025-12-04T13:38:32.0525730Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0525795Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.0525862Z ======================= 1 failed, 8 deselected in 57.72s =======================
2025-12-04T13:38:32.0525904Z Got exit code 1
2025-12-04T13:38:32.0525946Z Retrying single test...
2025-12-04T13:38:32.0526140Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-4c3c53bf15bf011b.xml
2025-12-04T13:38:32.0526199Z ============================= test session starts ==============================
2025-12-04T13:38:32.0526319Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.0526362Z cachedir: .pytest_cache
2025-12-04T13:38:32.0526523Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.0526618Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.0526662Z configfile: pytest.ini
2025-12-04T13:38:32.0526825Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.0526904Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.0527130Z stepcurrent: skipping 8 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_none_cuda
2025-12-04T13:38:32.0527176Z Running 1 items in this shard
2025-12-04T13:38:32.0527178Z 
2025-12-04T13:38:32.0527490Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_none_cuda I1204 13:06:38.614000 387321 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 387390
2025-12-04T13:38:32.0527651Z I1204 13:06:38.615000 387321 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 387391
2025-12-04T13:38:32.0527807Z I1204 13:06:38.615000 387321 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 387392
2025-12-04T13:38:32.0527963Z I1204 13:06:38.616000 387321 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 387393
2025-12-04T13:38:32.0528559Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0528599Z   _warn_cpu_init()
2025-12-04T13:38:32.0529184Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0529231Z   _warn_cpu_init()
2025-12-04T13:38:32.0529889Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0529946Z   _warn_cpu_init()
2025-12-04T13:38:32.0530518Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0530557Z   _warn_cpu_init()
2025-12-04T13:38:32.0530850Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.0530893Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0531035Z [rank2]:E1204 13:07:34.328000 387392 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0531223Z [rank2]:E1204 13:07:34.328000 387392 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0531518Z [rank2]:E1204 13:07:34.328000 387392 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0531673Z [rank2]:E1204 13:07:34.328000 387392 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0531960Z [rank2]:E1204 13:07:34.328000 387392 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0532084Z [rank2]:E1204 13:07:34.328000 387392 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0532363Z [rank2]:E1204 13:07:34.328000 387392 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0532513Z [rank2]:E1204 13:07:34.328000 387392 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0532809Z [rank2]:E1204 13:07:34.328000 387392 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0532957Z [rank2]:E1204 13:07:34.328000 387392 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0533232Z [rank2]:E1204 13:07:34.328000 387392 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0533371Z [rank2]:E1204 13:07:34.328000 387392 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0533663Z [rank2]:E1204 13:07:34.328000 387392 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0533813Z [rank2]:E1204 13:07:34.328000 387392 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0534294Z [rank2]:E1204 13:07:34.328000 387392 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 2. CUDA driver allocated memory was 2300575744 and is now 3835691008.
2025-12-04T13:38:32.0534424Z [rank2]:E1204 13:07:34.328000 387392 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0534621Z [rank2]:E1204 13:07:34.328000 387392 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0534979Z [rank2]:E1204 13:07:34.328000 387392 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda
2025-12-04T13:38:32.0535096Z [rank2]:E1204 13:07:34.328000 387392 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0535309Z [rank2]:E1204 13:07:34.328000 387392 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0535475Z [rank2]:E1204 13:07:34.328000 387392 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.0535525Z dist init r=2, world=4
2025-12-04T13:38:32.0535665Z [rank0]:E1204 13:07:34.371000 387390 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0535825Z [rank0]:E1204 13:07:34.371000 387390 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0536114Z [rank0]:E1204 13:07:34.371000 387390 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0536269Z [rank0]:E1204 13:07:34.371000 387390 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0536561Z [rank0]:E1204 13:07:34.371000 387390 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0536689Z [rank0]:E1204 13:07:34.371000 387390 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0536976Z [rank0]:E1204 13:07:34.371000 387390 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0537125Z [rank0]:E1204 13:07:34.371000 387390 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0537400Z [rank0]:E1204 13:07:34.371000 387390 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0537549Z [rank0]:E1204 13:07:34.371000 387390 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0537826Z [rank0]:E1204 13:07:34.371000 387390 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0537978Z [rank0]:E1204 13:07:34.371000 387390 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0538260Z [rank0]:E1204 13:07:34.371000 387390 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0538409Z [rank0]:E1204 13:07:34.371000 387390 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0538885Z [rank0]:E1204 13:07:34.371000 387390 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 2453667840 and is now 3988783104.
2025-12-04T13:38:32.0539013Z [rank0]:E1204 13:07:34.371000 387390 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0539209Z [rank0]:E1204 13:07:34.371000 387390 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0539566Z [rank0]:E1204 13:07:34.371000 387390 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda
2025-12-04T13:38:32.0539722Z [rank0]:E1204 13:07:34.371000 387390 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0539954Z [rank0]:E1204 13:07:34.371000 387390 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0540117Z [rank0]:E1204 13:07:34.371000 387390 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.0540157Z dist init r=0, world=4
2025-12-04T13:38:32.0540294Z [rank1]:E1204 13:07:34.377000 387391 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0540455Z [rank1]:E1204 13:07:34.377000 387391 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0540745Z [rank1]:E1204 13:07:34.377000 387391 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0540901Z [rank1]:E1204 13:07:34.377000 387391 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0541186Z [rank1]:E1204 13:07:34.377000 387391 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0541322Z [rank1]:E1204 13:07:34.377000 387391 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0541600Z [rank1]:E1204 13:07:34.377000 387391 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0541747Z [rank1]:E1204 13:07:34.377000 387391 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0542025Z [rank1]:E1204 13:07:34.377000 387391 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0542185Z [rank1]:E1204 13:07:34.377000 387391 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0542462Z [rank1]:E1204 13:07:34.377000 387391 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0542599Z [rank1]:E1204 13:07:34.377000 387391 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0542875Z [rank1]:E1204 13:07:34.377000 387391 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0543039Z [rank1]:E1204 13:07:34.377000 387391 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0543516Z [rank1]:E1204 13:07:34.377000 387391 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 1. CUDA driver allocated memory was 2317352960 and is now 3852468224.
2025-12-04T13:38:32.0543631Z [rank1]:E1204 13:07:34.377000 387391 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0543827Z [rank1]:E1204 13:07:34.377000 387391 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0544184Z [rank1]:E1204 13:07:34.377000 387391 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda
2025-12-04T13:38:32.0544309Z [rank1]:E1204 13:07:34.377000 387391 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0544520Z [rank1]:E1204 13:07:34.377000 387391 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0544685Z [rank1]:E1204 13:07:34.377000 387391 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.0544722Z dist init r=1, world=4
2025-12-04T13:38:32.0544861Z [rank3]:E1204 13:07:34.380000 387393 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0545021Z [rank3]:E1204 13:07:34.380000 387393 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0545310Z [rank3]:E1204 13:07:34.380000 387393 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0545476Z [rank3]:E1204 13:07:34.380000 387393 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0545759Z [rank3]:E1204 13:07:34.380000 387393 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0545883Z [rank3]:E1204 13:07:34.380000 387393 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0546162Z [rank3]:E1204 13:07:34.380000 387393 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0546312Z [rank3]:E1204 13:07:34.380000 387393 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0546600Z [rank3]:E1204 13:07:34.380000 387393 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0546749Z [rank3]:E1204 13:07:34.380000 387393 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0547025Z [rank3]:E1204 13:07:34.380000 387393 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0547171Z [rank3]:E1204 13:07:34.380000 387393 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0547452Z [rank3]:E1204 13:07:34.380000 387393 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0547601Z [rank3]:E1204 13:07:34.380000 387393 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0548078Z [rank3]:E1204 13:07:34.380000 387393 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 2250244096 and is now 3785359360.
2025-12-04T13:38:32.0548190Z [rank3]:E1204 13:07:34.380000 387393 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0548399Z [rank3]:E1204 13:07:34.380000 387393 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0548760Z [rank3]:E1204 13:07:34.380000 387393 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda
2025-12-04T13:38:32.0548871Z [rank3]:E1204 13:07:34.380000 387393 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0549084Z [rank3]:E1204 13:07:34.380000 387393 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0549247Z [rank3]:E1204 13:07:34.380000 387393 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.0549288Z dist init r=3, world=4
2025-12-04T13:38:32.0549663Z [rank0]:[W1204 13:07:34.621121381 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.0549705Z FAILED [57.6789s] [100%]
2025-12-04T13:38:32.0549707Z 
2025-12-04T13:38:32.0549777Z =================================== FAILURES ===================================
2025-12-04T13:38:32.0549879Z __ TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda ___
2025-12-04T13:38:32.0549925Z Traceback (most recent call last):
2025-12-04T13:38:32.0550090Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.0550135Z     self._join_processes(fn)
2025-12-04T13:38:32.0550308Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.0550365Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.0550558Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.0550603Z     raise RuntimeError(error)
2025-12-04T13:38:32.0550684Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.0550731Z Traceback (most recent call last):
2025-12-04T13:38:32.0550892Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0550936Z     getattr(self, test_name)()
2025-12-04T13:38:32.0551093Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0551143Z     fn()
2025-12-04T13:38:32.0551295Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0551338Z     method(*args, **kwargs)
2025-12-04T13:38:32.0551490Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0551533Z     method(*args, **kwargs)
2025-12-04T13:38:32.0551686Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0551724Z     with policy():
2025-12-04T13:38:32.0551877Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0551920Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0552275Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 2453667840 and is now 3988783104.
2025-12-04T13:38:32.0552293Z 
2025-12-04T13:38:32.0552370Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0552605Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda
2025-12-04T13:38:32.0552607Z 
2025-12-04T13:38:32.0552695Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0552697Z 
2025-12-04T13:38:32.0552699Z 
2025-12-04T13:38:32.0552777Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.0552865Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.0553102Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-4c3c53bf15bf011b.xml -
2025-12-04T13:38:32.0553163Z =========================== short test summary info ============================
2025-12-04T13:38:32.0553414Z FAILED [57.6789s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_none_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.0553460Z Traceback (most recent call last):
2025-12-04T13:38:32.0553642Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0553684Z     getattr(self, test_name)()
2025-12-04T13:38:32.0553845Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0553882Z     fn()
2025-12-04T13:38:32.0554033Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0554076Z     method(*args, **kwargs)
2025-12-04T13:38:32.0554227Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0554268Z     method(*args, **kwargs)
2025-12-04T13:38:32.0554431Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0554471Z     with policy():
2025-12-04T13:38:32.0554625Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0554667Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0555021Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 2453667840 and is now 3988783104.
2025-12-04T13:38:32.0555034Z 
2025-12-04T13:38:32.0555111Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0555342Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda
2025-12-04T13:38:32.0555344Z 
2025-12-04T13:38:32.0555436Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0555499Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.0555563Z ====================== 1 failed, 32 deselected in 57.84s =======================
2025-12-04T13:38:32.0555602Z Got exit code 1
2025-12-04T13:38:32.0555642Z Retrying single test...
2025-12-04T13:38:32.0555833Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-e63260897d77ebfd.xml
2025-12-04T13:38:32.0555892Z ============================= test session starts ==============================
2025-12-04T13:38:32.0556022Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.0556063Z cachedir: .pytest_cache
2025-12-04T13:38:32.0556222Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.0556269Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.0556311Z configfile: pytest.ini
2025-12-04T13:38:32.0556477Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.0556552Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.0556776Z stepcurrent: skipping 8 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_none_cuda
2025-12-04T13:38:32.0556820Z Running 1 items in this shard
2025-12-04T13:38:32.0556822Z 
2025-12-04T13:38:32.0557131Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_none_cuda I1204 13:07:38.814000 387723 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 387792
2025-12-04T13:38:32.0557289Z I1204 13:07:38.815000 387723 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 387793
2025-12-04T13:38:32.0557451Z I1204 13:07:38.815000 387723 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 387794
2025-12-04T13:38:32.0557606Z I1204 13:07:38.816000 387723 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 387795
2025-12-04T13:38:32.0558188Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0558226Z   _warn_cpu_init()
2025-12-04T13:38:32.0558808Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0558845Z   _warn_cpu_init()
2025-12-04T13:38:32.0559412Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0559463Z   _warn_cpu_init()
2025-12-04T13:38:32.0560065Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0560104Z   _warn_cpu_init()
2025-12-04T13:38:32.0560399Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.0560460Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0560602Z [rank1]:E1204 13:08:34.714000 387793 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0560767Z [rank1]:E1204 13:08:34.714000 387793 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0561057Z [rank1]:E1204 13:08:34.714000 387793 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0561211Z [rank1]:E1204 13:08:34.714000 387793 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0561497Z [rank1]:E1204 13:08:34.714000 387793 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0561623Z [rank1]:E1204 13:08:34.714000 387793 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0561916Z [rank1]:E1204 13:08:34.714000 387793 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0562065Z [rank1]:E1204 13:08:34.714000 387793 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0562342Z [rank1]:E1204 13:08:34.714000 387793 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0562494Z [rank1]:E1204 13:08:34.714000 387793 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0562771Z [rank1]:E1204 13:08:34.714000 387793 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0562924Z [rank1]:E1204 13:08:34.714000 387793 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0563201Z [rank1]:E1204 13:08:34.714000 387793 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0563351Z [rank1]:E1204 13:08:34.714000 387793 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0563829Z [rank1]:E1204 13:08:34.714000 387793 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 1. CUDA driver allocated memory was 2317352960 and is now 3852468224.
2025-12-04T13:38:32.0563962Z [rank1]:E1204 13:08:34.714000 387793 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0564159Z [rank1]:E1204 13:08:34.714000 387793 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0564517Z [rank1]:E1204 13:08:34.714000 387793 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda
2025-12-04T13:38:32.0564633Z [rank1]:E1204 13:08:34.714000 387793 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0564856Z [rank1]:E1204 13:08:34.714000 387793 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0565023Z [rank1]:E1204 13:08:34.714000 387793 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.0565062Z dist init r=1, world=4
2025-12-04T13:38:32.0565413Z [rank2]:E1204 13:08:34.730000 387794 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0565572Z [rank2]:E1204 13:08:34.730000 387794 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0565860Z [rank2]:E1204 13:08:34.730000 387794 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0566017Z [rank2]:E1204 13:08:34.730000 387794 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0566301Z [rank2]:E1204 13:08:34.730000 387794 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0566437Z [rank2]:E1204 13:08:34.730000 387794 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0566713Z [rank2]:E1204 13:08:34.730000 387794 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0566861Z [rank2]:E1204 13:08:34.730000 387794 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0567138Z [rank2]:E1204 13:08:34.730000 387794 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0567299Z [rank2]:E1204 13:08:34.730000 387794 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0567579Z [rank2]:E1204 13:08:34.730000 387794 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0567714Z [rank2]:E1204 13:08:34.730000 387794 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0567992Z [rank2]:E1204 13:08:34.730000 387794 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0568152Z [rank2]:E1204 13:08:34.730000 387794 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0568631Z [rank2]:E1204 13:08:34.730000 387794 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 2. CUDA driver allocated memory was 2300575744 and is now 3835691008.
2025-12-04T13:38:32.0568748Z [rank2]:E1204 13:08:34.730000 387794 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0568942Z [rank2]:E1204 13:08:34.730000 387794 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0569300Z [rank2]:E1204 13:08:34.730000 387794 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda
2025-12-04T13:38:32.0569431Z [rank2]:E1204 13:08:34.730000 387794 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0569685Z [rank2]:E1204 13:08:34.730000 387794 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0569851Z [rank2]:E1204 13:08:34.730000 387794 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.0569892Z dist init r=2, world=4
2025-12-04T13:38:32.0570028Z [rank0]:E1204 13:08:34.762000 387792 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0570192Z [rank0]:E1204 13:08:34.762000 387792 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0570481Z [rank0]:E1204 13:08:34.762000 387792 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0570652Z [rank0]:E1204 13:08:34.762000 387792 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0570939Z [rank0]:E1204 13:08:34.762000 387792 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0571062Z [rank0]:E1204 13:08:34.762000 387792 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0571341Z [rank0]:E1204 13:08:34.762000 387792 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0571489Z [rank0]:E1204 13:08:34.762000 387792 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0571781Z [rank0]:E1204 13:08:34.762000 387792 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0571927Z [rank0]:E1204 13:08:34.762000 387792 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0572207Z [rank0]:E1204 13:08:34.762000 387792 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0572356Z [rank0]:E1204 13:08:34.762000 387792 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0572634Z [rank0]:E1204 13:08:34.762000 387792 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0572784Z [rank0]:E1204 13:08:34.762000 387792 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0573262Z [rank0]:E1204 13:08:34.762000 387792 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 2453667840 and is now 3988783104.
2025-12-04T13:38:32.0573379Z [rank0]:E1204 13:08:34.762000 387792 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0573590Z [rank0]:E1204 13:08:34.762000 387792 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0573947Z [rank0]:E1204 13:08:34.762000 387792 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda
2025-12-04T13:38:32.0574062Z [rank0]:E1204 13:08:34.762000 387792 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0574273Z [rank0]:E1204 13:08:34.762000 387792 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0574439Z [rank0]:E1204 13:08:34.762000 387792 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.0574478Z dist init r=0, world=4
2025-12-04T13:38:32.0574617Z [rank3]:E1204 13:08:34.783000 387795 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0574777Z [rank3]:E1204 13:08:34.783000 387795 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0575076Z [rank3]:E1204 13:08:34.783000 387795 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0575231Z [rank3]:E1204 13:08:34.783000 387795 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0575516Z [rank3]:E1204 13:08:34.783000 387795 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0575644Z [rank3]:E1204 13:08:34.783000 387795 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0575933Z [rank3]:E1204 13:08:34.783000 387795 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0576084Z [rank3]:E1204 13:08:34.783000 387795 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0576359Z [rank3]:E1204 13:08:34.783000 387795 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0576509Z [rank3]:E1204 13:08:34.783000 387795 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0576796Z [rank3]:E1204 13:08:34.783000 387795 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0576935Z [rank3]:E1204 13:08:34.783000 387795 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0577214Z [rank3]:E1204 13:08:34.783000 387795 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0577361Z [rank3]:E1204 13:08:34.783000 387795 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0577841Z [rank3]:E1204 13:08:34.783000 387795 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 2250244096 and is now 3785359360.
2025-12-04T13:38:32.0577970Z [rank3]:E1204 13:08:34.783000 387795 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0578168Z [rank3]:E1204 13:08:34.783000 387795 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0578526Z [rank3]:E1204 13:08:34.783000 387795 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda
2025-12-04T13:38:32.0578638Z [rank3]:E1204 13:08:34.783000 387795 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0578851Z [rank3]:E1204 13:08:34.783000 387795 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0579016Z [rank3]:E1204 13:08:34.783000 387795 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.0579056Z dist init r=3, world=4
2025-12-04T13:38:32.0579401Z [rank0]:[W1204 13:08:35.002714555 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.0579444Z FAILED [57.7867s] [100%]
2025-12-04T13:38:32.0579446Z 
2025-12-04T13:38:32.0579501Z =================================== FAILURES ===================================
2025-12-04T13:38:32.0579648Z __ TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda ___
2025-12-04T13:38:32.0579696Z Traceback (most recent call last):
2025-12-04T13:38:32.0579863Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.0579905Z     self._join_processes(fn)
2025-12-04T13:38:32.0580093Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.0580150Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.0580327Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.0580372Z     raise RuntimeError(error)
2025-12-04T13:38:32.0580451Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T13:38:32.0580498Z Traceback (most recent call last):
2025-12-04T13:38:32.0580659Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0580716Z     getattr(self, test_name)()
2025-12-04T13:38:32.0580874Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0580910Z     fn()
2025-12-04T13:38:32.0581063Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0581105Z     method(*args, **kwargs)
2025-12-04T13:38:32.0581255Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0581297Z     method(*args, **kwargs)
2025-12-04T13:38:32.0581448Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0581487Z     with policy():
2025-12-04T13:38:32.0581640Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0581697Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0582048Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 1. CUDA driver allocated memory was 2317352960 and is now 3852468224.
2025-12-04T13:38:32.0582052Z 
2025-12-04T13:38:32.0582127Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0582360Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda
2025-12-04T13:38:32.0582362Z 
2025-12-04T13:38:32.0582448Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0582451Z 
2025-12-04T13:38:32.0582452Z 
2025-12-04T13:38:32.0582529Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.0582618Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.0582851Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-e63260897d77ebfd.xml -
2025-12-04T13:38:32.0582911Z =========================== short test summary info ============================
2025-12-04T13:38:32.0583183Z FAILED [57.7867s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_none_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T13:38:32.0583228Z Traceback (most recent call last):
2025-12-04T13:38:32.0583392Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0583434Z     getattr(self, test_name)()
2025-12-04T13:38:32.0583596Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0583633Z     fn()
2025-12-04T13:38:32.0583786Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0583827Z     method(*args, **kwargs)
2025-12-04T13:38:32.0583990Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0584032Z     method(*args, **kwargs)
2025-12-04T13:38:32.0584184Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0584224Z     with policy():
2025-12-04T13:38:32.0584377Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0584420Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0584790Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 1. CUDA driver allocated memory was 2317352960 and is now 3852468224.
2025-12-04T13:38:32.0584794Z 
2025-12-04T13:38:32.0584870Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0585101Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_none_cuda
2025-12-04T13:38:32.0585103Z 
2025-12-04T13:38:32.0585190Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0585253Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.0585317Z ====================== 1 failed, 32 deselected in 57.95s =======================
2025-12-04T13:38:32.0585357Z Got exit code 1
2025-12-04T13:38:32.0585546Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_none_cuda
2025-12-04T13:38:32.0585675Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T13:38:32.0585866Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-1e18d1fda7ad478d.xml
2025-12-04T13:38:32.0585925Z ============================= test session starts ==============================
2025-12-04T13:38:32.0586039Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.0586082Z cachedir: .pytest_cache
2025-12-04T13:38:32.0586238Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.0586286Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.0586327Z configfile: pytest.ini
2025-12-04T13:38:32.0586493Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.0586566Z collecting ... collected 60 items / 9 deselected / 51 selected
2025-12-04T13:38:32.0586624Z stepcurrent: skipping 9 already run items.
2025-12-04T13:38:32.0586667Z Running 24 items in this shard
2025-12-04T13:38:32.0586669Z 
2025-12-04T13:38:32.0587000Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda I1204 13:08:39.178000 388125 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 388194
2025-12-04T13:38:32.0587156Z I1204 13:08:39.179000 388125 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 388195
2025-12-04T13:38:32.0587310Z I1204 13:08:39.179000 388125 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 388196
2025-12-04T13:38:32.0587466Z I1204 13:08:39.180000 388125 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 388197
2025-12-04T13:38:32.0588055Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0588095Z   _warn_cpu_init()
2025-12-04T13:38:32.0588664Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0588715Z   _warn_cpu_init()
2025-12-04T13:38:32.0589286Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0589323Z   _warn_cpu_init()
2025-12-04T13:38:32.0589928Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0589982Z   _warn_cpu_init()
2025-12-04T13:38:32.0590280Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.0590322Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0590467Z [rank0]:E1204 13:09:35.016000 388194 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0590630Z [rank0]:E1204 13:09:35.016000 388194 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0590917Z [rank0]:E1204 13:09:35.016000 388194 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0591075Z [rank0]:E1204 13:09:35.016000 388194 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0591375Z [rank0]:E1204 13:09:35.016000 388194 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0591503Z [rank0]:E1204 13:09:35.016000 388194 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0591786Z [rank0]:E1204 13:09:35.016000 388194 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0591938Z [rank0]:E1204 13:09:35.016000 388194 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0592233Z [rank0]:E1204 13:09:35.016000 388194 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0592380Z [rank0]:E1204 13:09:35.016000 388194 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0592656Z [rank0]:E1204 13:09:35.016000 388194 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0592791Z [rank0]:E1204 13:09:35.016000 388194 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0593085Z [rank0]:E1204 13:09:35.016000 388194 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0593234Z [rank0]:E1204 13:09:35.016000 388194 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0593727Z [rank0]:E1204 13:09:35.016000 388194 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 2453667840 and is now 3988783104.
2025-12-04T13:38:32.0593844Z [rank0]:E1204 13:09:35.016000 388194 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0594040Z [rank0]:E1204 13:09:35.016000 388194 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0594432Z [rank0]:E1204 13:09:35.016000 388194 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.0594548Z [rank0]:E1204 13:09:35.016000 388194 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0594762Z [rank0]:E1204 13:09:35.016000 388194 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0594926Z [rank0]:E1204 13:09:35.016000 388194 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.0594967Z dist init r=0, world=4
2025-12-04T13:38:32.0595104Z [rank2]:E1204 13:09:35.031000 388196 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0595266Z [rank2]:E1204 13:09:35.031000 388196 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0595565Z [rank2]:E1204 13:09:35.031000 388196 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0595720Z [rank2]:E1204 13:09:35.031000 388196 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0596010Z [rank2]:E1204 13:09:35.031000 388196 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0596134Z [rank2]:E1204 13:09:35.031000 388196 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0596416Z [rank2]:E1204 13:09:35.031000 388196 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0596575Z [rank2]:E1204 13:09:35.031000 388196 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0596855Z [rank2]:E1204 13:09:35.031000 388196 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0597004Z [rank2]:E1204 13:09:35.031000 388196 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0597279Z [rank2]:E1204 13:09:35.031000 388196 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0597428Z [rank2]:E1204 13:09:35.031000 388196 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0597706Z [rank2]:E1204 13:09:35.031000 388196 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0597857Z [rank2]:E1204 13:09:35.031000 388196 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0598345Z [rank2]:E1204 13:09:35.031000 388196 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 2. CUDA driver allocated memory was 2300575744 and is now 3835691008.
2025-12-04T13:38:32.0598476Z [rank2]:E1204 13:09:35.031000 388196 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0598674Z [rank2]:E1204 13:09:35.031000 388196 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0599045Z [rank2]:E1204 13:09:35.031000 388196 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.0599159Z [rank2]:E1204 13:09:35.031000 388196 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0599370Z [rank2]:E1204 13:09:35.031000 388196 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0599536Z [rank2]:E1204 13:09:35.031000 388196 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.0599612Z dist init r=2, world=4
2025-12-04T13:38:32.0599755Z [rank3]:E1204 13:09:35.078000 388197 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0599929Z [rank3]:E1204 13:09:35.078000 388197 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0600216Z [rank3]:E1204 13:09:35.078000 388197 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0600371Z [rank3]:E1204 13:09:35.078000 388197 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0600656Z [rank3]:E1204 13:09:35.078000 388197 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0600799Z [rank3]:E1204 13:09:35.078000 388197 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0601078Z [rank3]:E1204 13:09:35.078000 388197 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0601226Z [rank3]:E1204 13:09:35.078000 388197 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0601500Z [rank3]:E1204 13:09:35.078000 388197 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0601662Z [rank3]:E1204 13:09:35.078000 388197 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0601940Z [rank3]:E1204 13:09:35.078000 388197 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0602076Z [rank3]:E1204 13:09:35.078000 388197 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0602353Z [rank3]:E1204 13:09:35.078000 388197 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0602501Z [rank3]:E1204 13:09:35.078000 388197 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0603005Z [rank3]:E1204 13:09:35.078000 388197 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 2250244096 and is now 3785359360.
2025-12-04T13:38:32.0603121Z [rank3]:E1204 13:09:35.078000 388197 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0603317Z [rank3]:E1204 13:09:35.078000 388197 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0603693Z [rank3]:E1204 13:09:35.078000 388197 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.0603807Z [rank3]:E1204 13:09:35.078000 388197 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0604021Z [rank3]:E1204 13:09:35.078000 388197 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0604196Z [rank3]:E1204 13:09:35.078000 388197 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.0604237Z dist init r=3, world=4
2025-12-04T13:38:32.0604373Z [rank1]:E1204 13:09:35.083000 388195 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0604534Z [rank1]:E1204 13:09:35.083000 388195 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0604819Z [rank1]:E1204 13:09:35.083000 388195 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0604975Z [rank1]:E1204 13:09:35.083000 388195 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0605274Z [rank1]:E1204 13:09:35.083000 388195 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0605397Z [rank1]:E1204 13:09:35.083000 388195 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0605676Z [rank1]:E1204 13:09:35.083000 388195 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0605836Z [rank1]:E1204 13:09:35.083000 388195 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0606117Z [rank1]:E1204 13:09:35.083000 388195 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0606264Z [rank1]:E1204 13:09:35.083000 388195 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0606540Z [rank1]:E1204 13:09:35.083000 388195 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0606677Z [rank1]:E1204 13:09:35.083000 388195 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0606952Z [rank1]:E1204 13:09:35.083000 388195 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0607114Z [rank1]:E1204 13:09:35.083000 388195 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0607603Z [rank1]:E1204 13:09:35.083000 388195 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 1. CUDA driver allocated memory was 2317352960 and is now 3852468224.
2025-12-04T13:38:32.0607720Z [rank1]:E1204 13:09:35.083000 388195 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0607918Z [rank1]:E1204 13:09:35.083000 388195 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0608294Z [rank1]:E1204 13:09:35.083000 388195 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.0608420Z [rank1]:E1204 13:09:35.083000 388195 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0608632Z [rank1]:E1204 13:09:35.083000 388195 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0608797Z [rank1]:E1204 13:09:35.083000 388195 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.0608834Z dist init r=1, world=4
2025-12-04T13:38:32.0609176Z [rank0]:[W1204 13:09:35.206508919 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.0609217Z FAILED [57.7858s] [  4%]
2025-12-04T13:38:32.0609219Z 
2025-12-04T13:38:32.0609288Z =================================== FAILURES ===================================
2025-12-04T13:38:32.0609399Z _ TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda _
2025-12-04T13:38:32.0609448Z Traceback (most recent call last):
2025-12-04T13:38:32.0609649Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.0609693Z     self._join_processes(fn)
2025-12-04T13:38:32.0609868Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.0609936Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.0610117Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.0610160Z     raise RuntimeError(error)
2025-12-04T13:38:32.0610242Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.0610287Z Traceback (most recent call last):
2025-12-04T13:38:32.0610450Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0610491Z     getattr(self, test_name)()
2025-12-04T13:38:32.0610651Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0610685Z     fn()
2025-12-04T13:38:32.0610838Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0610879Z     method(*args, **kwargs)
2025-12-04T13:38:32.0611046Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0611085Z     method(*args, **kwargs)
2025-12-04T13:38:32.0611238Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0611275Z     with policy():
2025-12-04T13:38:32.0611431Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0611471Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0611837Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 2453667840 and is now 3988783104.
2025-12-04T13:38:32.0611840Z 
2025-12-04T13:38:32.0611918Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0612162Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.0612164Z 
2025-12-04T13:38:32.0612254Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0612257Z 
2025-12-04T13:38:32.0612271Z 
2025-12-04T13:38:32.0612346Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.0612435Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.0612672Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-1e18d1fda7ad478d.xml -
2025-12-04T13:38:32.0612733Z =========================== short test summary info ============================
2025-12-04T13:38:32.0612993Z FAILED [57.7858s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.0613041Z Traceback (most recent call last):
2025-12-04T13:38:32.0613220Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0613263Z     getattr(self, test_name)()
2025-12-04T13:38:32.0613425Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0613459Z     fn()
2025-12-04T13:38:32.0613612Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0613651Z     method(*args, **kwargs)
2025-12-04T13:38:32.0613804Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0614928Z     method(*args, **kwargs)
2025-12-04T13:38:32.0615080Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0615117Z     with policy():
2025-12-04T13:38:32.0615272Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0615313Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0615678Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 2453667840 and is now 3988783104.
2025-12-04T13:38:32.0615681Z 
2025-12-04T13:38:32.0615755Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0616003Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.0616020Z 
2025-12-04T13:38:32.0616109Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0616173Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.0616238Z ======================= 1 failed, 9 deselected in 57.93s =======================
2025-12-04T13:38:32.0616275Z Got exit code 1
2025-12-04T13:38:32.0616317Z Retrying single test...
2025-12-04T13:38:32.0616507Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-74a9a6fcf6eb4745.xml
2025-12-04T13:38:32.0616567Z ============================= test session starts ==============================
2025-12-04T13:38:32.0616679Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.0616724Z cachedir: .pytest_cache
2025-12-04T13:38:32.0616882Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.0616931Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.0616971Z configfile: pytest.ini
2025-12-04T13:38:32.0617156Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.0617233Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.0617470Z stepcurrent: skipping 9 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.0617513Z Running 1 items in this shard
2025-12-04T13:38:32.0617515Z 
2025-12-04T13:38:32.0617838Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda I1204 13:09:39.608000 388527 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 388596
2025-12-04T13:38:32.0617998Z I1204 13:09:39.608000 388527 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 388597
2025-12-04T13:38:32.0618161Z I1204 13:09:39.609000 388527 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 388598
2025-12-04T13:38:32.0618317Z I1204 13:09:39.609000 388527 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 388599
2025-12-04T13:38:32.0618898Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0618950Z   _warn_cpu_init()
2025-12-04T13:38:32.0619517Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0619557Z   _warn_cpu_init()
2025-12-04T13:38:32.0620170Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0620223Z   _warn_cpu_init()
2025-12-04T13:38:32.0620794Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0620831Z   _warn_cpu_init()
2025-12-04T13:38:32.0621124Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.0621170Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0621314Z [rank3]:E1204 13:10:36.113000 388599 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0621479Z [rank3]:E1204 13:10:36.113000 388599 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0621782Z [rank3]:E1204 13:10:36.113000 388599 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0621939Z [rank3]:E1204 13:10:36.113000 388599 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0622224Z [rank3]:E1204 13:10:36.113000 388599 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0622352Z [rank3]:E1204 13:10:36.113000 388599 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0622643Z [rank3]:E1204 13:10:36.113000 388599 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0622798Z [rank3]:E1204 13:10:36.113000 388599 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0623076Z [rank3]:E1204 13:10:36.113000 388599 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0623222Z [rank3]:E1204 13:10:36.113000 388599 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0623518Z [rank3]:E1204 13:10:36.113000 388599 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0623655Z [rank3]:E1204 13:10:36.113000 388599 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0623939Z [rank3]:E1204 13:10:36.113000 388599 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0624086Z [rank3]:E1204 13:10:36.113000 388599 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0624581Z [rank3]:E1204 13:10:36.113000 388599 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 2250244096 and is now 3785359360.
2025-12-04T13:38:32.0624709Z [rank3]:E1204 13:10:36.113000 388599 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0624905Z [rank3]:E1204 13:10:36.113000 388599 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0625280Z [rank3]:E1204 13:10:36.113000 388599 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.0625393Z [rank3]:E1204 13:10:36.113000 388599 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0625606Z [rank3]:E1204 13:10:36.113000 388599 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0625771Z [rank3]:E1204 13:10:36.113000 388599 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.0625812Z dist init r=3, world=4
2025-12-04T13:38:32.0625961Z [rank2]:E1204 13:10:36.143000 388598 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0626119Z [rank2]:E1204 13:10:36.143000 388598 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0626410Z [rank2]:E1204 13:10:36.143000 388598 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0626565Z [rank2]:E1204 13:10:36.143000 388598 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0626864Z [rank2]:E1204 13:10:36.143000 388598 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0626987Z [rank2]:E1204 13:10:36.143000 388598 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0627265Z [rank2]:E1204 13:10:36.143000 388598 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0627413Z [rank2]:E1204 13:10:36.143000 388598 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0627692Z [rank2]:E1204 13:10:36.143000 388598 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0627853Z [rank2]:E1204 13:10:36.143000 388598 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0628129Z [rank2]:E1204 13:10:36.143000 388598 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0628267Z [rank2]:E1204 13:10:36.143000 388598 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0628546Z [rank2]:E1204 13:10:36.143000 388598 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0628697Z [rank2]:E1204 13:10:36.143000 388598 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0629195Z [rank2]:E1204 13:10:36.143000 388598 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 2. CUDA driver allocated memory was 2300575744 and is now 3835691008.
2025-12-04T13:38:32.0629311Z [rank2]:E1204 13:10:36.143000 388598 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0629507Z [rank2]:E1204 13:10:36.143000 388598 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0629919Z [rank2]:E1204 13:10:36.143000 388598 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.0630035Z [rank2]:E1204 13:10:36.143000 388598 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0630261Z [rank2]:E1204 13:10:36.143000 388598 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0630426Z [rank2]:E1204 13:10:36.143000 388598 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.0630465Z dist init r=2, world=4
2025-12-04T13:38:32.0630605Z [rank0]:E1204 13:10:36.171000 388596 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0630766Z [rank0]:E1204 13:10:36.171000 388596 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0631054Z [rank0]:E1204 13:10:36.171000 388596 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0631222Z [rank0]:E1204 13:10:36.171000 388596 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0631506Z [rank0]:E1204 13:10:36.171000 388596 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0631632Z [rank0]:E1204 13:10:36.171000 388596 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0631911Z [rank0]:E1204 13:10:36.171000 388596 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0632076Z [rank0]:E1204 13:10:36.171000 388596 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0632355Z [rank0]:E1204 13:10:36.171000 388596 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0632501Z [rank0]:E1204 13:10:36.171000 388596 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0637920Z [rank0]:E1204 13:10:36.171000 388596 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0638073Z [rank0]:E1204 13:10:36.171000 388596 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0638406Z [rank0]:E1204 13:10:36.171000 388596 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0638555Z [rank0]:E1204 13:10:36.171000 388596 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0639050Z [rank0]:E1204 13:10:36.171000 388596 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 2453667840 and is now 3988783104.
2025-12-04T13:38:32.0639165Z [rank0]:E1204 13:10:36.171000 388596 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0639367Z [rank0]:E1204 13:10:36.171000 388596 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0639810Z [rank0]:E1204 13:10:36.171000 388596 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.0639925Z [rank0]:E1204 13:10:36.171000 388596 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0640141Z [rank0]:E1204 13:10:36.171000 388596 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0640304Z [rank0]:E1204 13:10:36.171000 388596 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.0640348Z dist init r=0, world=4
2025-12-04T13:38:32.0640486Z [rank1]:E1204 13:10:36.197000 388597 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0640666Z [rank1]:E1204 13:10:36.197000 388597 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0640953Z [rank1]:E1204 13:10:36.197000 388597 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0641113Z [rank1]:E1204 13:10:36.197000 388597 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0641402Z [rank1]:E1204 13:10:36.197000 388597 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0641543Z [rank1]:E1204 13:10:36.197000 388597 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0641827Z [rank1]:E1204 13:10:36.197000 388597 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0641976Z [rank1]:E1204 13:10:36.197000 388597 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0642257Z [rank1]:E1204 13:10:36.197000 388597 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0642404Z [rank1]:E1204 13:10:36.197000 388597 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0642685Z [rank1]:E1204 13:10:36.197000 388597 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0642840Z [rank1]:E1204 13:10:36.197000 388597 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0643120Z [rank1]:E1204 13:10:36.197000 388597 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0643270Z [rank1]:E1204 13:10:36.197000 388597 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0643758Z [rank1]:E1204 13:10:36.197000 388597 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 1. CUDA driver allocated memory was 2317352960 and is now 3852468224.
2025-12-04T13:38:32.0643877Z [rank1]:E1204 13:10:36.197000 388597 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0644086Z [rank1]:E1204 13:10:36.197000 388597 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0644459Z [rank1]:E1204 13:10:36.197000 388597 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.0644576Z [rank1]:E1204 13:10:36.197000 388597 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0644788Z [rank1]:E1204 13:10:36.197000 388597 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0644958Z [rank1]:E1204 13:10:36.197000 388597 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.0645007Z dist init r=1, world=4
2025-12-04T13:38:32.0645350Z [rank0]:[W1204 13:10:36.425966146 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.0645393Z FAILED [58.3860s] [100%]
2025-12-04T13:38:32.0645395Z 
2025-12-04T13:38:32.0645455Z =================================== FAILURES ===================================
2025-12-04T13:38:32.0645565Z _ TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda _
2025-12-04T13:38:32.0645627Z Traceback (most recent call last):
2025-12-04T13:38:32.0645797Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.0645841Z     self._join_processes(fn)
2025-12-04T13:38:32.0646019Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.0646075Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.0646256Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.0646299Z     raise RuntimeError(error)
2025-12-04T13:38:32.0646381Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.0646426Z Traceback (most recent call last):
2025-12-04T13:38:32.0646592Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0646646Z     getattr(self, test_name)()
2025-12-04T13:38:32.0646808Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0646843Z     fn()
2025-12-04T13:38:32.0646999Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0647041Z     method(*args, **kwargs)
2025-12-04T13:38:32.0647194Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0647233Z     method(*args, **kwargs)
2025-12-04T13:38:32.0647386Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0647424Z     with policy():
2025-12-04T13:38:32.0647580Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0647622Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0647994Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 2250244096 and is now 3785359360.
2025-12-04T13:38:32.0647996Z 
2025-12-04T13:38:32.0648086Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0648333Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.0648336Z 
2025-12-04T13:38:32.0648428Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0648430Z 
2025-12-04T13:38:32.0648431Z 
2025-12-04T13:38:32.0648509Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.0648601Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.0648841Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-74a9a6fcf6eb4745.xml -
2025-12-04T13:38:32.0648915Z =========================== short test summary info ============================
2025-12-04T13:38:32.0649184Z FAILED [58.3860s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.0649233Z Traceback (most recent call last):
2025-12-04T13:38:32.0649400Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0649444Z     getattr(self, test_name)()
2025-12-04T13:38:32.0649644Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0649682Z     fn()
2025-12-04T13:38:32.0649838Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0649879Z     method(*args, **kwargs)
2025-12-04T13:38:32.0650034Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0650075Z     method(*args, **kwargs)
2025-12-04T13:38:32.0650227Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0650264Z     with policy():
2025-12-04T13:38:32.0650421Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0650463Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0650829Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 2250244096 and is now 3785359360.
2025-12-04T13:38:32.0650848Z 
2025-12-04T13:38:32.0650925Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0651173Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.0651175Z 
2025-12-04T13:38:32.0651263Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0651327Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.0651395Z ====================== 1 failed, 32 deselected in 58.55s =======================
2025-12-04T13:38:32.0651433Z Got exit code 1
2025-12-04T13:38:32.0651476Z Retrying single test...
2025-12-04T13:38:32.0651668Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-575314950bfef071.xml
2025-12-04T13:38:32.0651731Z ============================= test session starts ==============================
2025-12-04T13:38:32.0651847Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.0651903Z cachedir: .pytest_cache
2025-12-04T13:38:32.0652063Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.0652114Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.0652154Z configfile: pytest.ini
2025-12-04T13:38:32.0652323Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.0652398Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.0652640Z stepcurrent: skipping 9 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.0652683Z Running 1 items in this shard
2025-12-04T13:38:32.0652686Z 
2025-12-04T13:38:32.0653023Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda I1204 13:10:40.594000 388929 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 388998
2025-12-04T13:38:32.0653181Z I1204 13:10:40.594000 388929 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 388999
2025-12-04T13:38:32.0653338Z I1204 13:10:40.595000 388929 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 389000
2025-12-04T13:38:32.0653493Z I1204 13:10:40.595000 388929 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 389001
2025-12-04T13:38:32.0654106Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0654146Z   _warn_cpu_init()
2025-12-04T13:38:32.0654717Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0654769Z   _warn_cpu_init()
2025-12-04T13:38:32.0655342Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0655379Z   _warn_cpu_init()
2025-12-04T13:38:32.0655946Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0655984Z   _warn_cpu_init()
2025-12-04T13:38:32.0656297Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.0656342Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0656487Z [rank2]:E1204 13:11:36.224000 389000 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0656652Z [rank2]:E1204 13:11:36.224000 389000 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0656939Z [rank2]:E1204 13:11:36.224000 389000 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0657097Z [rank2]:E1204 13:11:36.224000 389000 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0657392Z [rank2]:E1204 13:11:36.224000 389000 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0657519Z [rank2]:E1204 13:11:36.224000 389000 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0657795Z [rank2]:E1204 13:11:36.224000 389000 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0657945Z [rank2]:E1204 13:11:36.224000 389000 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0658235Z [rank2]:E1204 13:11:36.224000 389000 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0658381Z [rank2]:E1204 13:11:36.224000 389000 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0658661Z [rank2]:E1204 13:11:36.224000 389000 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0658797Z [rank2]:E1204 13:11:36.224000 389000 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0659076Z [rank2]:E1204 13:11:36.224000 389000 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0659235Z [rank2]:E1204 13:11:36.224000 389000 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0659773Z [rank2]:E1204 13:11:36.224000 389000 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 2. CUDA driver allocated memory was 2300575744 and is now 3835691008.
2025-12-04T13:38:32.0659893Z [rank2]:E1204 13:11:36.224000 389000 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0660087Z [rank2]:E1204 13:11:36.224000 389000 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0660460Z [rank2]:E1204 13:11:36.224000 389000 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.0660575Z [rank2]:E1204 13:11:36.224000 389000 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0660803Z [rank2]:E1204 13:11:36.224000 389000 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0660968Z [rank2]:E1204 13:11:36.224000 389000 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.0661012Z dist init r=2, world=4
2025-12-04T13:38:32.0661150Z [rank0]:E1204 13:11:36.267000 388998 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0661316Z [rank0]:E1204 13:11:36.267000 388998 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0661622Z [rank0]:E1204 13:11:36.267000 388998 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0661777Z [rank0]:E1204 13:11:36.267000 388998 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0662065Z [rank0]:E1204 13:11:36.267000 388998 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0662188Z [rank0]:E1204 13:11:36.267000 388998 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0662480Z [rank0]:E1204 13:11:36.267000 388998 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0662631Z [rank0]:E1204 13:11:36.267000 388998 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0662912Z [rank0]:E1204 13:11:36.267000 388998 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0663061Z [rank0]:E1204 13:11:36.267000 388998 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0663337Z [rank0]:E1204 13:11:36.267000 388998 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0663491Z [rank0]:E1204 13:11:36.267000 388998 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0663769Z [rank0]:E1204 13:11:36.267000 388998 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0663921Z [rank0]:E1204 13:11:36.267000 388998 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0664408Z [rank0]:E1204 13:11:36.267000 388998 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 2453667840 and is now 3988783104.
2025-12-04T13:38:32.0664526Z [rank0]:E1204 13:11:36.267000 388998 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0664724Z [rank0]:E1204 13:11:36.267000 388998 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0665103Z [rank0]:E1204 13:11:36.267000 388998 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.0665219Z [rank0]:E1204 13:11:36.267000 388998 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0665429Z [rank0]:E1204 13:11:36.267000 388998 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0665597Z [rank0]:E1204 13:11:36.267000 388998 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.0665638Z dist init r=0, world=4
2025-12-04T13:38:32.0665779Z [rank1]:E1204 13:11:36.269000 388999 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0665951Z [rank1]:E1204 13:11:36.269000 388999 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0666238Z [rank1]:E1204 13:11:36.269000 388999 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0666395Z [rank1]:E1204 13:11:36.269000 388999 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0666678Z [rank1]:E1204 13:11:36.269000 388999 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0666817Z [rank1]:E1204 13:11:36.269000 388999 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0667095Z [rank1]:E1204 13:11:36.269000 388999 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0667245Z [rank1]:E1204 13:11:36.269000 388999 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0667520Z [rank1]:E1204 13:11:36.269000 388999 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0667670Z [rank1]:E1204 13:11:36.269000 388999 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0667966Z [rank1]:E1204 13:11:36.269000 388999 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0668103Z [rank1]:E1204 13:11:36.269000 388999 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0668380Z [rank1]:E1204 13:11:36.269000 388999 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0668528Z [rank1]:E1204 13:11:36.269000 388999 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0669015Z [rank1]:E1204 13:11:36.269000 388999 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 1. CUDA driver allocated memory was 2317352960 and is now 3852468224.
2025-12-04T13:38:32.0669135Z [rank1]:E1204 13:11:36.269000 388999 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0669342Z [rank1]:E1204 13:11:36.269000 388999 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0669770Z [rank1]:E1204 13:11:36.269000 388999 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.0669885Z [rank1]:E1204 13:11:36.269000 388999 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0670098Z [rank1]:E1204 13:11:36.269000 388999 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0670285Z [rank1]:E1204 13:11:36.269000 388999 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.0670327Z dist init r=1, world=4
2025-12-04T13:38:32.0670465Z [rank3]:E1204 13:11:36.276000 389001 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0670627Z [rank3]:E1204 13:11:36.276000 389001 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0670921Z [rank3]:E1204 13:11:36.276000 389001 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0671088Z [rank3]:E1204 13:11:36.276000 389001 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0671378Z [rank3]:E1204 13:11:36.276000 389001 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0671501Z [rank3]:E1204 13:11:36.276000 389001 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0671780Z [rank3]:E1204 13:11:36.276000 389001 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0671926Z [rank3]:E1204 13:11:36.276000 389001 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0672225Z [rank3]:E1204 13:11:36.276000 389001 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0672375Z [rank3]:E1204 13:11:36.276000 389001 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0672655Z [rank3]:E1204 13:11:36.276000 389001 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0672795Z [rank3]:E1204 13:11:36.276000 389001 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0673071Z [rank3]:E1204 13:11:36.276000 389001 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0673225Z [rank3]:E1204 13:11:36.276000 389001 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0673726Z [rank3]:E1204 13:11:36.276000 389001 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 2250244096 and is now 3785359360.
2025-12-04T13:38:32.0673844Z [rank3]:E1204 13:11:36.276000 389001 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0674041Z [rank3]:E1204 13:11:36.276000 389001 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0674411Z [rank3]:E1204 13:11:36.276000 389001 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.0674538Z [rank3]:E1204 13:11:36.276000 389001 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0674751Z [rank3]:E1204 13:11:36.276000 389001 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0674920Z [rank3]:E1204 13:11:36.276000 389001 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.0674959Z dist init r=3, world=4
2025-12-04T13:38:32.0675300Z [rank0]:[W1204 13:11:36.506596506 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.0675355Z FAILED [57.4837s] [100%]
2025-12-04T13:38:32.0675359Z 
2025-12-04T13:38:32.0675416Z =================================== FAILURES ===================================
2025-12-04T13:38:32.0675529Z _ TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda _
2025-12-04T13:38:32.0675575Z Traceback (most recent call last):
2025-12-04T13:38:32.0675741Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.0675785Z     self._join_processes(fn)
2025-12-04T13:38:32.0675959Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.0676013Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.0676196Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.0676250Z     raise RuntimeError(error)
2025-12-04T13:38:32.0676332Z RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T13:38:32.0676378Z Traceback (most recent call last):
2025-12-04T13:38:32.0676544Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0676586Z     getattr(self, test_name)()
2025-12-04T13:38:32.0676749Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0676783Z     fn()
2025-12-04T13:38:32.0676939Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0676979Z     method(*args, **kwargs)
2025-12-04T13:38:32.0677135Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0677175Z     method(*args, **kwargs)
2025-12-04T13:38:32.0677329Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0677367Z     with policy():
2025-12-04T13:38:32.0677539Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0677580Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0677947Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 2. CUDA driver allocated memory was 2300575744 and is now 3835691008.
2025-12-04T13:38:32.0677950Z 
2025-12-04T13:38:32.0678027Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0678275Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.0678278Z 
2025-12-04T13:38:32.0678378Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0678380Z 
2025-12-04T13:38:32.0678382Z 
2025-12-04T13:38:32.0678459Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.0678549Z Process 2 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.0678782Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-575314950bfef071.xml -
2025-12-04T13:38:32.0678846Z =========================== short test summary info ============================
2025-12-04T13:38:32.0679111Z FAILED [57.4837s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda - RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T13:38:32.0679171Z Traceback (most recent call last):
2025-12-04T13:38:32.0679339Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0679381Z     getattr(self, test_name)()
2025-12-04T13:38:32.0679544Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0679626Z     fn()
2025-12-04T13:38:32.0679781Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0679821Z     method(*args, **kwargs)
2025-12-04T13:38:32.0679974Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0680015Z     method(*args, **kwargs)
2025-12-04T13:38:32.0680184Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0680221Z     with policy():
2025-12-04T13:38:32.0680378Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0680421Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0680787Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 2. CUDA driver allocated memory was 2300575744 and is now 3835691008.
2025-12-04T13:38:32.0680789Z 
2025-12-04T13:38:32.0680865Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0681112Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.0681116Z 
2025-12-04T13:38:32.0681207Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0681270Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.0681338Z ====================== 1 failed, 32 deselected in 57.65s =======================
2025-12-04T13:38:32.0681391Z Got exit code 1
2025-12-04T13:38:32.0681587Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.0681716Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T13:38:32.0681908Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-49f8d9fac1f68b0e.xml
2025-12-04T13:38:32.0681967Z ============================= test session starts ==============================
2025-12-04T13:38:32.0682084Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.0682125Z cachedir: .pytest_cache
2025-12-04T13:38:32.0682299Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.0682346Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.0682391Z configfile: pytest.ini
2025-12-04T13:38:32.0682555Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.0682632Z collecting ... collected 60 items / 10 deselected / 50 selected
2025-12-04T13:38:32.0682685Z stepcurrent: skipping 10 already run items.
2025-12-04T13:38:32.0682733Z Running 23 items in this shard
2025-12-04T13:38:32.0682735Z 
2025-12-04T13:38:32.0683053Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_no_shard_cuda I1204 13:11:40.736000 389331 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 389400
2025-12-04T13:38:32.0683223Z I1204 13:11:40.737000 389331 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 389401
2025-12-04T13:38:32.0683380Z I1204 13:11:40.737000 389331 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 389402
2025-12-04T13:38:32.0683531Z I1204 13:11:40.738000 389331 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 389403
2025-12-04T13:38:32.0683830Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0683881Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.0684173Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0684236Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.0684819Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0684860Z   _warn_cpu_init()
2025-12-04T13:38:32.0685433Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0685477Z   _warn_cpu_init()
2025-12-04T13:38:32.0685782Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0685836Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.0686402Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0686443Z   _warn_cpu_init()
2025-12-04T13:38:32.0686744Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0686825Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.0687113Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0687190Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.0687478Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0687564Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.0687860Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.0687905Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0688191Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0688242Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.0688819Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0688872Z   _warn_cpu_init()
2025-12-04T13:38:32.0689161Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0689236Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.0689466Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.0689511Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0689788Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.0689829Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0690081Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.0690121Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0690345Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.0690385Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0690607Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.0690648Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0690869Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.0690909Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0691147Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.0691188Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0691409Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.0691450Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0691597Z [rank0]:E1204 13:11:48.547000 389400 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0691782Z [rank0]:E1204 13:11:48.547000 389400 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0692075Z [rank0]:E1204 13:11:48.547000 389400 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0692236Z [rank0]:E1204 13:11:48.547000 389400 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0692524Z [rank0]:E1204 13:11:48.547000 389400 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0692651Z [rank0]:E1204 13:11:48.547000 389400 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0692929Z [rank0]:E1204 13:11:48.547000 389400 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0693101Z [rank0]:E1204 13:11:48.547000 389400 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0693383Z [rank0]:E1204 13:11:48.547000 389400 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0693532Z [rank0]:E1204 13:11:48.547000 389400 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0693810Z [rank0]:E1204 13:11:48.547000 389400 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0693950Z [rank0]:E1204 13:11:48.547000 389400 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0694229Z [rank0]:E1204 13:11:48.547000 389400 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0694387Z [rank0]:E1204 13:11:48.547000 389400 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0694879Z [rank0]:E1204 13:11:48.547000 389400 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 108032 on device 0. CUDA driver allocated memory was 2453667840 and is now 3988783104.
2025-12-04T13:38:32.0694998Z [rank0]:E1204 13:11:48.547000 389400 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0695194Z [rank0]:E1204 13:11:48.547000 389400 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0695569Z [rank0]:E1204 13:11:48.547000 389400 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda
2025-12-04T13:38:32.0695683Z [rank0]:E1204 13:11:48.547000 389400 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0695899Z [rank0]:E1204 13:11:48.547000 389400 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0696075Z [rank0]:E1204 13:11:48.547000 389400 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.0696116Z dist init r=0, world=4
2025-12-04T13:38:32.0696254Z [rank1]:E1204 13:11:48.550000 389401 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0696415Z [rank1]:E1204 13:11:48.550000 389401 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0696702Z [rank1]:E1204 13:11:48.550000 389401 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0696855Z [rank1]:E1204 13:11:48.550000 389401 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0697142Z [rank1]:E1204 13:11:48.550000 389401 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0697277Z [rank1]:E1204 13:11:48.550000 389401 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0697557Z [rank1]:E1204 13:11:48.550000 389401 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0697705Z [rank1]:E1204 13:11:48.550000 389401 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0697985Z [rank1]:E1204 13:11:48.550000 389401 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0698135Z [rank1]:E1204 13:11:48.550000 389401 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0698411Z [rank1]:E1204 13:11:48.550000 389401 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0698559Z [rank1]:E1204 13:11:48.550000 389401 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0698838Z [rank1]:E1204 13:11:48.550000 389401 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0698988Z [rank1]:E1204 13:11:48.550000 389401 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0699468Z [rank1]:E1204 13:11:48.550000 389401 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 1. CUDA driver allocated memory was 2317352960 and is now 3852468224.
2025-12-04T13:38:32.0699641Z [rank1]:E1204 13:11:48.550000 389401 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0699839Z [rank1]:E1204 13:11:48.550000 389401 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0700523Z [rank1]:E1204 13:11:48.550000 389401 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda
2025-12-04T13:38:32.0700655Z [rank1]:E1204 13:11:48.550000 389401 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0700869Z [rank1]:E1204 13:11:48.550000 389401 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0701035Z [rank1]:E1204 13:11:48.550000 389401 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.0701075Z dist init r=1, world=4
2025-12-04T13:38:32.0701215Z [rank3]:E1204 13:11:48.574000 389403 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0701375Z [rank3]:E1204 13:11:48.574000 389403 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0701661Z [rank3]:E1204 13:11:48.574000 389403 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0701837Z [rank3]:E1204 13:11:48.574000 389403 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0702121Z [rank3]:E1204 13:11:48.574000 389403 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0702247Z [rank3]:E1204 13:11:48.574000 389403 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0702522Z [rank3]:E1204 13:11:48.574000 389403 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0702670Z [rank3]:E1204 13:11:48.574000 389403 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0702947Z [rank3]:E1204 13:11:48.574000 389403 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0703097Z [rank3]:E1204 13:11:48.574000 389403 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0703390Z [rank3]:E1204 13:11:48.574000 389403 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0703526Z [rank3]:E1204 13:11:48.574000 389403 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0703804Z [rank3]:E1204 13:11:48.574000 389403 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0703954Z [rank3]:E1204 13:11:48.574000 389403 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0704447Z [rank3]:E1204 13:11:48.574000 389403 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 108032 on device 3. CUDA driver allocated memory was 2250244096 and is now 3785359360.
2025-12-04T13:38:32.0704562Z [rank3]:E1204 13:11:48.574000 389403 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0704758Z [rank3]:E1204 13:11:48.574000 389403 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0705134Z [rank3]:E1204 13:11:48.574000 389403 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda
2025-12-04T13:38:32.0705249Z [rank3]:E1204 13:11:48.574000 389403 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0705467Z [rank3]:E1204 13:11:48.574000 389403 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0705631Z [rank3]:E1204 13:11:48.574000 389403 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.0705672Z dist init r=3, world=4
2025-12-04T13:38:32.0705810Z [rank2]:E1204 13:11:48.583000 389402 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0705987Z [rank2]:E1204 13:11:48.583000 389402 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0706305Z [rank2]:E1204 13:11:48.583000 389402 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0706459Z [rank2]:E1204 13:11:48.583000 389402 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0706745Z [rank2]:E1204 13:11:48.583000 389402 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0706867Z [rank2]:E1204 13:11:48.583000 389402 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0707150Z [rank2]:E1204 13:11:48.583000 389402 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0707298Z [rank2]:E1204 13:11:48.583000 389402 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0707592Z [rank2]:E1204 13:11:48.583000 389402 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0707742Z [rank2]:E1204 13:11:48.583000 389402 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0708017Z [rank2]:E1204 13:11:48.583000 389402 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0708155Z [rank2]:E1204 13:11:48.583000 389402 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0708444Z [rank2]:E1204 13:11:48.583000 389402 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0708596Z [rank2]:E1204 13:11:48.583000 389402 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0709075Z [rank2]:E1204 13:11:48.583000 389402 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 108032 on device 2. CUDA driver allocated memory was 2300575744 and is now 3835691008.
2025-12-04T13:38:32.0709200Z [rank2]:E1204 13:11:48.583000 389402 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0709396Z [rank2]:E1204 13:11:48.583000 389402 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0709793Z [rank2]:E1204 13:11:48.583000 389402 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda
2025-12-04T13:38:32.0709907Z [rank2]:E1204 13:11:48.583000 389402 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0710119Z [rank2]:E1204 13:11:48.583000 389402 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0710287Z [rank2]:E1204 13:11:48.583000 389402 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.0710339Z dist init r=2, world=4
2025-12-04T13:38:32.0710680Z [rank0]:[W1204 13:11:48.804148515 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.0710719Z FAILED [9.8222s] [  4%]
2025-12-04T13:38:32.0710725Z 
2025-12-04T13:38:32.0710781Z =================================== FAILURES ===================================
2025-12-04T13:38:32.0710885Z _ TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda _
2025-12-04T13:38:32.0710931Z Traceback (most recent call last):
2025-12-04T13:38:32.0711098Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.0711143Z     self._join_processes(fn)
2025-12-04T13:38:32.0711318Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.0711372Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.0711555Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.0711599Z     raise RuntimeError(error)
2025-12-04T13:38:32.0711692Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.0711738Z Traceback (most recent call last):
2025-12-04T13:38:32.0711901Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0711942Z     getattr(self, test_name)()
2025-12-04T13:38:32.0712104Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0712139Z     fn()
2025-12-04T13:38:32.0712293Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0712335Z     method(*args, **kwargs)
2025-12-04T13:38:32.0712504Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0712543Z     method(*args, **kwargs)
2025-12-04T13:38:32.0712698Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0712738Z     with policy():
2025-12-04T13:38:32.0712893Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0712933Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0713293Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 108032 on device 0. CUDA driver allocated memory was 2453667840 and is now 3988783104.
2025-12-04T13:38:32.0713310Z 
2025-12-04T13:38:32.0713386Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0713626Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda
2025-12-04T13:38:32.0713628Z 
2025-12-04T13:38:32.0713718Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0713720Z 
2025-12-04T13:38:32.0713722Z 
2025-12-04T13:38:32.0713797Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.0713885Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.0714118Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-49f8d9fac1f68b0e.xml -
2025-12-04T13:38:32.0714192Z =========================== short test summary info ============================
2025-12-04T13:38:32.0714448Z FAILED [9.8222s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_no_shard_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.0714495Z Traceback (most recent call last):
2025-12-04T13:38:32.0714663Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0714706Z     getattr(self, test_name)()
2025-12-04T13:38:32.0714868Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0714902Z     fn()
2025-12-04T13:38:32.0715056Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0715098Z     method(*args, **kwargs)
2025-12-04T13:38:32.0715252Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0715291Z     method(*args, **kwargs)
2025-12-04T13:38:32.0715445Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0715491Z     with policy():
2025-12-04T13:38:32.0715647Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0715687Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0716047Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 108032 on device 0. CUDA driver allocated memory was 2453667840 and is now 3988783104.
2025-12-04T13:38:32.0716051Z 
2025-12-04T13:38:32.0716125Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0716362Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda
2025-12-04T13:38:32.0716375Z 
2025-12-04T13:38:32.0716462Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0716525Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.0716588Z ======================= 1 failed, 10 deselected in 9.96s =======================
2025-12-04T13:38:32.0716625Z Got exit code 1
2025-12-04T13:38:32.0716668Z Retrying single test...
2025-12-04T13:38:32.0716860Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-14230de52c25d103.xml
2025-12-04T13:38:32.0716939Z ============================= test session starts ==============================
2025-12-04T13:38:32.0717053Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.0717096Z cachedir: .pytest_cache
2025-12-04T13:38:32.0717254Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.0717301Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.0717342Z configfile: pytest.ini
2025-12-04T13:38:32.0717505Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.0717579Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.0717810Z stepcurrent: skipping 10 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_no_shard_cuda
2025-12-04T13:38:32.0717854Z Running 1 items in this shard
2025-12-04T13:38:32.0717867Z 
2025-12-04T13:38:32.0718180Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_no_shard_cuda I1204 13:11:53.447000 389733 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 389802
2025-12-04T13:38:32.0718337Z I1204 13:11:53.448000 389733 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 389803
2025-12-04T13:38:32.0718491Z I1204 13:11:53.448000 389733 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 389804
2025-12-04T13:38:32.0718643Z I1204 13:11:53.449000 389733 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 389805
2025-12-04T13:38:32.0718937Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0718991Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.0719279Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0719330Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.0719956Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0719995Z   _warn_cpu_init()
2025-12-04T13:38:32.0720284Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0720333Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.0720915Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0720952Z   _warn_cpu_init()
2025-12-04T13:38:32.0721240Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0721302Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.0721880Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0721919Z   _warn_cpu_init()
2025-12-04T13:38:32.0722493Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0722548Z   _warn_cpu_init()
2025-12-04T13:38:32.0722836Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0722916Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.0723203Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0723280Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.0723568Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0723641Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.0723929Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0724011Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.0724305Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.0724348Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0724578Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.0724622Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0724846Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.0724897Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0725121Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.0725163Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0725382Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.0725424Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0725642Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.0725696Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0725915Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.0725956Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0726176Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.0726219Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0726439Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.0726481Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0726628Z [rank0]:E1204 13:12:01.106000 389802 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0726804Z [rank0]:E1204 13:12:01.106000 389802 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0727096Z [rank0]:E1204 13:12:01.106000 389802 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0727250Z [rank0]:E1204 13:12:01.106000 389802 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0727538Z [rank0]:E1204 13:12:01.106000 389802 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0727662Z [rank0]:E1204 13:12:01.106000 389802 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0727941Z [rank0]:E1204 13:12:01.106000 389802 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0728100Z [rank0]:E1204 13:12:01.106000 389802 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0728377Z [rank0]:E1204 13:12:01.106000 389802 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0728525Z [rank0]:E1204 13:12:01.106000 389802 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0728805Z [rank0]:E1204 13:12:01.106000 389802 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0728945Z [rank0]:E1204 13:12:01.106000 389802 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0729233Z [rank0]:E1204 13:12:01.106000 389802 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0729383Z [rank0]:E1204 13:12:01.106000 389802 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0729900Z [rank0]:E1204 13:12:01.106000 389802 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 0. CUDA driver allocated memory was 2453667840 and is now 3988783104.
2025-12-04T13:38:32.0730032Z [rank0]:E1204 13:12:01.106000 389802 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0730228Z [rank0]:E1204 13:12:01.106000 389802 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0730592Z [rank0]:E1204 13:12:01.106000 389802 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda
2025-12-04T13:38:32.0730708Z [rank0]:E1204 13:12:01.106000 389802 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0730919Z [rank0]:E1204 13:12:01.106000 389802 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0731099Z [rank0]:E1204 13:12:01.106000 389802 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.0731138Z dist init r=0, world=4
2025-12-04T13:38:32.0731277Z [rank2]:E1204 13:12:01.107000 389804 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0731436Z [rank2]:E1204 13:12:01.107000 389804 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0731724Z [rank2]:E1204 13:12:01.107000 389804 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0731879Z [rank2]:E1204 13:12:01.107000 389804 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0732167Z [rank2]:E1204 13:12:01.107000 389804 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0732293Z [rank2]:E1204 13:12:01.107000 389804 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0732583Z [rank2]:E1204 13:12:01.107000 389804 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0732732Z [rank2]:E1204 13:12:01.107000 389804 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0733007Z [rank2]:E1204 13:12:01.107000 389804 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0733157Z [rank2]:E1204 13:12:01.107000 389804 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0733456Z [rank2]:E1204 13:12:01.107000 389804 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0733592Z [rank2]:E1204 13:12:01.107000 389804 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0733869Z [rank2]:E1204 13:12:01.107000 389804 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0734016Z [rank2]:E1204 13:12:01.107000 389804 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0734516Z [rank2]:E1204 13:12:01.107000 389804 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 108032 on device 2. CUDA driver allocated memory was 2300575744 and is now 3835691008.
2025-12-04T13:38:32.0734631Z [rank2]:E1204 13:12:01.107000 389804 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0734826Z [rank2]:E1204 13:12:01.107000 389804 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0735189Z [rank2]:E1204 13:12:01.107000 389804 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda
2025-12-04T13:38:32.0735315Z [rank2]:E1204 13:12:01.107000 389804 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0735530Z [rank2]:E1204 13:12:01.107000 389804 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0735695Z [rank2]:E1204 13:12:01.107000 389804 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.0735735Z dist init r=2, world=4
2025-12-04T13:38:32.0735872Z [rank1]:E1204 13:12:01.153000 389803 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0736032Z [rank1]:E1204 13:12:01.153000 389803 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0736320Z [rank1]:E1204 13:12:01.153000 389803 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0736476Z [rank1]:E1204 13:12:01.153000 389803 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0736773Z [rank1]:E1204 13:12:01.153000 389803 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0736895Z [rank1]:E1204 13:12:01.153000 389803 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0737176Z [rank1]:E1204 13:12:01.153000 389803 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0737324Z [rank1]:E1204 13:12:01.153000 389803 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0737615Z [rank1]:E1204 13:12:01.153000 389803 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0737762Z [rank1]:E1204 13:12:01.153000 389803 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0738041Z [rank1]:E1204 13:12:01.153000 389803 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0738180Z [rank1]:E1204 13:12:01.153000 389803 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0738466Z [rank1]:E1204 13:12:01.153000 389803 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0738615Z [rank1]:E1204 13:12:01.153000 389803 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0739096Z [rank1]:E1204 13:12:01.153000 389803 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 112128 on device 1. CUDA driver allocated memory was 2317352960 and is now 3852468224.
2025-12-04T13:38:32.0739211Z [rank1]:E1204 13:12:01.153000 389803 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0739407Z [rank1]:E1204 13:12:01.153000 389803 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0739830Z [rank1]:E1204 13:12:01.153000 389803 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda
2025-12-04T13:38:32.0739945Z [rank1]:E1204 13:12:01.153000 389803 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0740155Z [rank1]:E1204 13:12:01.153000 389803 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0740321Z [rank1]:E1204 13:12:01.153000 389803 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.0740361Z dist init r=1, world=4
2025-12-04T13:38:32.0740505Z [rank3]:E1204 13:12:01.159000 389805 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0740665Z [rank3]:E1204 13:12:01.159000 389805 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0740969Z [rank3]:E1204 13:12:01.159000 389805 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0741122Z [rank3]:E1204 13:12:01.159000 389805 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0741410Z [rank3]:E1204 13:12:01.159000 389805 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0741535Z [rank3]:E1204 13:12:01.159000 389805 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0741814Z [rank3]:E1204 13:12:01.159000 389805 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0741976Z [rank3]:E1204 13:12:01.159000 389805 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0742251Z [rank3]:E1204 13:12:01.159000 389805 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0742400Z [rank3]:E1204 13:12:01.159000 389805 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0742674Z [rank3]:E1204 13:12:01.159000 389805 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0742826Z [rank3]:E1204 13:12:01.159000 389805 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0743107Z [rank3]:E1204 13:12:01.159000 389805 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0743255Z [rank3]:E1204 13:12:01.159000 389805 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0743736Z [rank3]:E1204 13:12:01.159000 389805 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 112128 on device 3. CUDA driver allocated memory was 2250244096 and is now 3785359360.
2025-12-04T13:38:32.0743863Z [rank3]:E1204 13:12:01.159000 389805 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0744061Z [rank3]:E1204 13:12:01.159000 389805 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0744423Z [rank3]:E1204 13:12:01.159000 389805 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda
2025-12-04T13:38:32.0744534Z [rank3]:E1204 13:12:01.159000 389805 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0744746Z [rank3]:E1204 13:12:01.159000 389805 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0744913Z [rank3]:E1204 13:12:01.159000 389805 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.0744952Z dist init r=3, world=4
2025-12-04T13:38:32.0745301Z [rank0]:[W1204 13:12:01.309315532 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.0745343Z FAILED [9.6202s] [100%]
2025-12-04T13:38:32.0745345Z 
2025-12-04T13:38:32.0745401Z =================================== FAILURES ===================================
2025-12-04T13:38:32.0745502Z _ TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda _
2025-12-04T13:38:32.0745548Z Traceback (most recent call last):
2025-12-04T13:38:32.0745713Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.0745757Z     self._join_processes(fn)
2025-12-04T13:38:32.0745934Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.0746000Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.0746180Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.0746225Z     raise RuntimeError(error)
2025-12-04T13:38:32.0746305Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.0746352Z Traceback (most recent call last):
2025-12-04T13:38:32.0746513Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0746557Z     getattr(self, test_name)()
2025-12-04T13:38:32.0746727Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0746764Z     fn()
2025-12-04T13:38:32.0746916Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0746959Z     method(*args, **kwargs)
2025-12-04T13:38:32.0747112Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0747154Z     method(*args, **kwargs)
2025-12-04T13:38:32.0747304Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0747344Z     with policy():
2025-12-04T13:38:32.0747500Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0747542Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0747902Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 0. CUDA driver allocated memory was 2453667840 and is now 3988783104.
2025-12-04T13:38:32.0747914Z 
2025-12-04T13:38:32.0747992Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0748232Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda
2025-12-04T13:38:32.0748234Z 
2025-12-04T13:38:32.0748321Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0748323Z 
2025-12-04T13:38:32.0748384Z Process 2 exited with error code 10 and exception:
2025-12-04T13:38:32.0748428Z Traceback (most recent call last):
2025-12-04T13:38:32.0748593Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0748635Z     getattr(self, test_name)()
2025-12-04T13:38:32.0748796Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0748830Z     fn()
2025-12-04T13:38:32.0748984Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0749040Z     method(*args, **kwargs)
2025-12-04T13:38:32.0749193Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0749233Z     method(*args, **kwargs)
2025-12-04T13:38:32.0749384Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0749420Z     with policy():
2025-12-04T13:38:32.0749615Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0749658Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0750033Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 108032 on device 2. CUDA driver allocated memory was 2300575744 and is now 3835691008.
2025-12-04T13:38:32.0750035Z 
2025-12-04T13:38:32.0750113Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0750348Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda
2025-12-04T13:38:32.0750350Z 
2025-12-04T13:38:32.0750437Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0750439Z 
2025-12-04T13:38:32.0750454Z 
2025-12-04T13:38:32.0750529Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.0750619Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.0750853Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-14230de52c25d103.xml -
2025-12-04T13:38:32.0750915Z =========================== short test summary info ============================
2025-12-04T13:38:32.0751169Z FAILED [9.6202s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_no_shard_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.0751217Z Traceback (most recent call last):
2025-12-04T13:38:32.0751383Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0751424Z     getattr(self, test_name)()
2025-12-04T13:38:32.0751590Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0751637Z     fn()
2025-12-04T13:38:32.0751792Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0751831Z     method(*args, **kwargs)
2025-12-04T13:38:32.0751984Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0752023Z     method(*args, **kwargs)
2025-12-04T13:38:32.0752176Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0752211Z     with policy():
2025-12-04T13:38:32.0752366Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0752405Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0752763Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 0. CUDA driver allocated memory was 2453667840 and is now 3988783104.
2025-12-04T13:38:32.0752767Z 
2025-12-04T13:38:32.0752841Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0753091Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda
2025-12-04T13:38:32.0753093Z 
2025-12-04T13:38:32.0753182Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0753184Z 
2025-12-04T13:38:32.0753242Z Process 2 exited with error code 10 and exception:
2025-12-04T13:38:32.0753290Z Traceback (most recent call last):
2025-12-04T13:38:32.0753453Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0753498Z     getattr(self, test_name)()
2025-12-04T13:38:32.0753658Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0753695Z     fn()
2025-12-04T13:38:32.0753857Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0753898Z     method(*args, **kwargs)
2025-12-04T13:38:32.0754048Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0754089Z     method(*args, **kwargs)
2025-12-04T13:38:32.0754350Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0754388Z     with policy():
2025-12-04T13:38:32.0754553Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0754596Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0754958Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 108032 on device 2. CUDA driver allocated memory was 2300575744 and is now 3835691008.
2025-12-04T13:38:32.0754961Z 
2025-12-04T13:38:32.0755035Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0755274Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda
2025-12-04T13:38:32.0755276Z 
2025-12-04T13:38:32.0755362Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0755426Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.0755500Z ======================= 1 failed, 32 deselected in 9.78s =======================
2025-12-04T13:38:32.0755539Z Got exit code 1
2025-12-04T13:38:32.0755579Z Retrying single test...
2025-12-04T13:38:32.0755773Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-604e6d382b28e77b.xml
2025-12-04T13:38:32.0755831Z ============================= test session starts ==============================
2025-12-04T13:38:32.0755947Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.0755988Z cachedir: .pytest_cache
2025-12-04T13:38:32.0756148Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.0756195Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.0756236Z configfile: pytest.ini
2025-12-04T13:38:32.0756402Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.0756480Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.0756709Z stepcurrent: skipping 10 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_no_shard_cuda
2025-12-04T13:38:32.0756755Z Running 1 items in this shard
2025-12-04T13:38:32.0756767Z 
2025-12-04T13:38:32.0757078Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_no_shard_cuda I1204 13:12:05.485000 390135 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 390204
2025-12-04T13:38:32.0757236Z I1204 13:12:05.486000 390135 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 390205
2025-12-04T13:38:32.0757391Z I1204 13:12:05.487000 390135 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 390206
2025-12-04T13:38:32.0757543Z I1204 13:12:05.487000 390135 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 390207
2025-12-04T13:38:32.0757849Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0757901Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.0758483Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0758532Z   _warn_cpu_init()
2025-12-04T13:38:32.0758823Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0758874Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.0759446Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0759484Z   _warn_cpu_init()
2025-12-04T13:38:32.0759816Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0759909Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.0760197Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0760274Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.0760561Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0760609Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.0761189Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0761240Z   _warn_cpu_init()
2025-12-04T13:38:32.0761533Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.0761575Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0761863Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0761938Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.0762240Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0762290Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.0762863Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0762918Z   _warn_cpu_init()
2025-12-04T13:38:32.0763205Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0763282Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.0763513Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.0763557Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0763780Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.0763821Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0764044Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.0764102Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0764325Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.0764366Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0764589Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.0764629Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0764849Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.0764889Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0765110Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.0765152Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0765377Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.0765417Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0765592Z [rank0]:E1204 13:12:13.248000 390204 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0765758Z [rank0]:E1204 13:12:13.248000 390204 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0766050Z [rank0]:E1204 13:12:13.248000 390204 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0766211Z [rank0]:E1204 13:12:13.248000 390204 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0766506Z [rank0]:E1204 13:12:13.248000 390204 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0766634Z [rank0]:E1204 13:12:13.248000 390204 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0766911Z [rank0]:E1204 13:12:13.248000 390204 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0767061Z [rank0]:E1204 13:12:13.248000 390204 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0767350Z [rank0]:E1204 13:12:13.248000 390204 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0767504Z [rank0]:E1204 13:12:13.248000 390204 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0767788Z [rank0]:E1204 13:12:13.248000 390204 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0767926Z [rank0]:E1204 13:12:13.248000 390204 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0768206Z [rank0]:E1204 13:12:13.248000 390204 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0768365Z [rank0]:E1204 13:12:13.248000 390204 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0768855Z [rank0]:E1204 13:12:13.248000 390204 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 112128 on device 0. CUDA driver allocated memory was 2453667840 and is now 3988783104.
2025-12-04T13:38:32.0768973Z [rank0]:E1204 13:12:13.248000 390204 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0769168Z [rank0]:E1204 13:12:13.248000 390204 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0769533Z [rank0]:E1204 13:12:13.248000 390204 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda
2025-12-04T13:38:32.0769688Z [rank0]:E1204 13:12:13.248000 390204 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0769921Z [rank0]:E1204 13:12:13.248000 390204 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0770087Z [rank0]:E1204 13:12:13.248000 390204 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.0770128Z dist init r=0, world=4
2025-12-04T13:38:32.0770266Z [rank2]:E1204 13:12:13.251000 390206 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0770428Z [rank2]:E1204 13:12:13.251000 390206 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0770722Z [rank2]:E1204 13:12:13.251000 390206 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0770888Z [rank2]:E1204 13:12:13.251000 390206 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0771174Z [rank2]:E1204 13:12:13.251000 390206 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0771297Z [rank2]:E1204 13:12:13.251000 390206 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0771574Z [rank2]:E1204 13:12:13.251000 390206 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0771740Z [rank2]:E1204 13:12:13.251000 390206 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0772019Z [rank2]:E1204 13:12:13.251000 390206 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0772167Z [rank2]:E1204 13:12:13.251000 390206 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0772444Z [rank2]:E1204 13:12:13.251000 390206 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0772584Z [rank2]:E1204 13:12:13.251000 390206 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0772874Z [rank2]:E1204 13:12:13.251000 390206 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0773025Z [rank2]:E1204 13:12:13.251000 390206 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0773508Z [rank2]:E1204 13:12:13.251000 390206 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 99840 on device 2. CUDA driver allocated memory was 2300575744 and is now 3835691008.
2025-12-04T13:38:32.0773623Z [rank2]:E1204 13:12:13.251000 390206 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0773822Z [rank2]:E1204 13:12:13.251000 390206 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0774196Z [rank2]:E1204 13:12:13.251000 390206 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda
2025-12-04T13:38:32.0774312Z [rank2]:E1204 13:12:13.251000 390206 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0774525Z [rank2]:E1204 13:12:13.251000 390206 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0774692Z [rank2]:E1204 13:12:13.251000 390206 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.0774732Z dist init r=2, world=4
2025-12-04T13:38:32.0774871Z [rank3]:E1204 13:12:13.319000 390207 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0775039Z [rank3]:E1204 13:12:13.319000 390207 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0775330Z [rank3]:E1204 13:12:13.319000 390207 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0775484Z [rank3]:E1204 13:12:13.319000 390207 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0775766Z [rank3]:E1204 13:12:13.319000 390207 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0775903Z [rank3]:E1204 13:12:13.319000 390207 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0776180Z [rank3]:E1204 13:12:13.319000 390207 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0776329Z [rank3]:E1204 13:12:13.319000 390207 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0776606Z [rank3]:E1204 13:12:13.319000 390207 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0776754Z [rank3]:E1204 13:12:13.319000 390207 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0777045Z [rank3]:E1204 13:12:13.319000 390207 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0777182Z [rank3]:E1204 13:12:13.319000 390207 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0777461Z [rank3]:E1204 13:12:13.319000 390207 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0777608Z [rank3]:E1204 13:12:13.319000 390207 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0778091Z [rank3]:E1204 13:12:13.319000 390207 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 99840 on device 3. CUDA driver allocated memory was 2250244096 and is now 3785359360.
2025-12-04T13:38:32.0778206Z [rank3]:E1204 13:12:13.319000 390207 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0778413Z [rank3]:E1204 13:12:13.319000 390207 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0778779Z [rank3]:E1204 13:12:13.319000 390207 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda
2025-12-04T13:38:32.0778891Z [rank3]:E1204 13:12:13.319000 390207 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0779107Z [rank3]:E1204 13:12:13.319000 390207 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0779271Z [rank3]:E1204 13:12:13.319000 390207 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.0779323Z dist init r=3, world=4
2025-12-04T13:38:32.0779461Z [rank1]:E1204 13:12:13.319000 390205 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0779657Z [rank1]:E1204 13:12:13.319000 390205 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0779944Z [rank1]:E1204 13:12:13.319000 390205 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0780118Z [rank1]:E1204 13:12:13.319000 390205 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0780406Z [rank1]:E1204 13:12:13.319000 390205 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0780529Z [rank1]:E1204 13:12:13.319000 390205 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0780807Z [rank1]:E1204 13:12:13.319000 390205 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0780955Z [rank1]:E1204 13:12:13.319000 390205 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0781235Z [rank1]:E1204 13:12:13.319000 390205 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0781396Z [rank1]:E1204 13:12:13.319000 390205 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0781677Z [rank1]:E1204 13:12:13.319000 390205 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0781815Z [rank1]:E1204 13:12:13.319000 390205 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0782092Z [rank1]:E1204 13:12:13.319000 390205 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0782242Z [rank1]:E1204 13:12:13.319000 390205 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0782740Z [rank1]:E1204 13:12:13.319000 390205 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 103936 on device 1. CUDA driver allocated memory was 2317352960 and is now 3852468224.
2025-12-04T13:38:32.0782856Z [rank1]:E1204 13:12:13.319000 390205 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0783050Z [rank1]:E1204 13:12:13.319000 390205 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0783416Z [rank1]:E1204 13:12:13.319000 390205 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda
2025-12-04T13:38:32.0783532Z [rank1]:E1204 13:12:13.319000 390205 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0783756Z [rank1]:E1204 13:12:13.319000 390205 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0783924Z [rank1]:E1204 13:12:13.319000 390205 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.0783961Z dist init r=1, world=4
2025-12-04T13:38:32.0784302Z [rank0]:[W1204 13:12:13.497086933 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.0784353Z FAILED [9.9195s] [100%]
2025-12-04T13:38:32.0784355Z 
2025-12-04T13:38:32.0784413Z =================================== FAILURES ===================================
2025-12-04T13:38:32.0784513Z _ TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda _
2025-12-04T13:38:32.0784562Z Traceback (most recent call last):
2025-12-04T13:38:32.0784727Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.0784772Z     self._join_processes(fn)
2025-12-04T13:38:32.0784946Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.0785000Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.0785181Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.0785225Z     raise RuntimeError(error)
2025-12-04T13:38:32.0785319Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.0785364Z Traceback (most recent call last):
2025-12-04T13:38:32.0785528Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0785571Z     getattr(self, test_name)()
2025-12-04T13:38:32.0785732Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0785767Z     fn()
2025-12-04T13:38:32.0785920Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0785961Z     method(*args, **kwargs)
2025-12-04T13:38:32.0786114Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0786155Z     method(*args, **kwargs)
2025-12-04T13:38:32.0786310Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0786346Z     with policy():
2025-12-04T13:38:32.0786503Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0786544Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0786917Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 112128 on device 0. CUDA driver allocated memory was 2453667840 and is now 3988783104.
2025-12-04T13:38:32.0786920Z 
2025-12-04T13:38:32.0786997Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0787234Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda
2025-12-04T13:38:32.0787238Z 
2025-12-04T13:38:32.0787328Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0787330Z 
2025-12-04T13:38:32.0787332Z 
2025-12-04T13:38:32.0787418Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.0787508Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.0787743Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-604e6d382b28e77b.xml -
2025-12-04T13:38:32.0787805Z =========================== short test summary info ============================
2025-12-04T13:38:32.0788056Z FAILED [9.9195s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_no_shard_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.0788117Z Traceback (most recent call last):
2025-12-04T13:38:32.0788282Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0788326Z     getattr(self, test_name)()
2025-12-04T13:38:32.0788488Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0788522Z     fn()
2025-12-04T13:38:32.0788677Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0788717Z     method(*args, **kwargs)
2025-12-04T13:38:32.0788871Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0788910Z     method(*args, **kwargs)
2025-12-04T13:38:32.0789063Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0789112Z     with policy():
2025-12-04T13:38:32.0789266Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0789306Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0789696Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 112128 on device 0. CUDA driver allocated memory was 2453667840 and is now 3988783104.
2025-12-04T13:38:32.0789698Z 
2025-12-04T13:38:32.0789773Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0790010Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_no_shard_cuda
2025-12-04T13:38:32.0790013Z 
2025-12-04T13:38:32.0790100Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0790166Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.0790229Z ====================== 1 failed, 32 deselected in 10.08s =======================
2025-12-04T13:38:32.0790266Z Got exit code 1
2025-12-04T13:38:32.0790453Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_no_shard_cuda
2025-12-04T13:38:32.0790609Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T13:38:32.0790803Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6a82e7f2d48533a4.xml
2025-12-04T13:38:32.0790861Z ============================= test session starts ==============================
2025-12-04T13:38:32.0790978Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.0791021Z cachedir: .pytest_cache
2025-12-04T13:38:32.0791182Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.0791227Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.0791269Z configfile: pytest.ini
2025-12-04T13:38:32.0791449Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.0791526Z collecting ... collected 60 items / 11 deselected / 49 selected
2025-12-04T13:38:32.0791578Z stepcurrent: skipping 11 already run items.
2025-12-04T13:38:32.0791623Z Running 22 items in this shard
2025-12-04T13:38:32.0791625Z 
2025-12-04T13:38:32.0791933Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_none_cuda I1204 13:12:18.128000 390537 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 390606
2025-12-04T13:38:32.0792108Z I1204 13:12:18.129000 390537 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 390607
2025-12-04T13:38:32.0792263Z I1204 13:12:18.130000 390537 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 390608
2025-12-04T13:38:32.0792414Z I1204 13:12:18.131000 390537 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 390609
2025-12-04T13:38:32.0793103Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0793142Z   _warn_cpu_init()
2025-12-04T13:38:32.0793736Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0793773Z   _warn_cpu_init()
2025-12-04T13:38:32.0794345Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0794385Z   _warn_cpu_init()
2025-12-04T13:38:32.0794966Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0795006Z   _warn_cpu_init()
2025-12-04T13:38:32.0795298Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.0795343Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0795491Z [rank0]:E1204 13:13:13.999000 390606 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0795654Z [rank0]:E1204 13:13:13.999000 390606 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0795966Z [rank0]:E1204 13:13:13.999000 390606 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0796121Z [rank0]:E1204 13:13:13.999000 390606 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0796406Z [rank0]:E1204 13:13:13.999000 390606 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0796530Z [rank0]:E1204 13:13:13.999000 390606 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0796823Z [rank0]:E1204 13:13:13.999000 390606 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0796971Z [rank0]:E1204 13:13:13.999000 390606 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0797252Z [rank0]:E1204 13:13:13.999000 390606 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0797400Z [rank0]:E1204 13:13:13.999000 390606 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0797675Z [rank0]:E1204 13:13:13.999000 390606 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0797826Z [rank0]:E1204 13:13:13.999000 390606 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0798104Z [rank0]:E1204 13:13:13.999000 390606 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0798254Z [rank0]:E1204 13:13:13.999000 390606 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0798732Z [rank0]:E1204 13:13:13.999000 390606 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 2453667840 and is now 3965714432.
2025-12-04T13:38:32.0798850Z [rank0]:E1204 13:13:13.999000 390606 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0799048Z [rank0]:E1204 13:13:13.999000 390606 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0799419Z [rank0]:E1204 13:13:13.999000 390606 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda
2025-12-04T13:38:32.0799535Z [rank0]:E1204 13:13:13.999000 390606 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0799784Z [rank0]:E1204 13:13:13.999000 390606 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0799951Z [rank0]:E1204 13:13:13.999000 390606 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.0799989Z dist init r=0, world=4
2025-12-04T13:38:32.0800145Z [rank2]:E1204 13:13:13.999000 390608 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0800308Z [rank2]:E1204 13:13:13.999000 390608 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0800597Z [rank2]:E1204 13:13:13.999000 390608 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0800752Z [rank2]:E1204 13:13:13.999000 390608 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0801053Z [rank2]:E1204 13:13:13.999000 390608 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0801181Z [rank2]:E1204 13:13:13.999000 390608 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0801458Z [rank2]:E1204 13:13:13.999000 390608 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0801608Z [rank2]:E1204 13:13:13.999000 390608 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0801883Z [rank2]:E1204 13:13:13.999000 390608 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0802059Z [rank2]:E1204 13:13:13.999000 390608 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0802336Z [rank2]:E1204 13:13:13.999000 390608 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0802472Z [rank2]:E1204 13:13:13.999000 390608 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0802750Z [rank2]:E1204 13:13:13.999000 390608 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0802898Z [rank2]:E1204 13:13:13.999000 390608 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0803385Z [rank2]:E1204 13:13:13.999000 390608 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 2300575744 and is now 3812622336.
2025-12-04T13:38:32.0803518Z [rank2]:E1204 13:13:13.999000 390608 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0803712Z [rank2]:E1204 13:13:13.999000 390608 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0804071Z [rank2]:E1204 13:13:13.999000 390608 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda
2025-12-04T13:38:32.0804185Z [rank2]:E1204 13:13:13.999000 390608 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0804398Z [rank2]:E1204 13:13:13.999000 390608 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0804572Z [rank2]:E1204 13:13:13.999000 390608 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.0804613Z dist init r=2, world=4
2025-12-04T13:38:32.0804750Z [rank1]:E1204 13:13:14.004000 390607 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0804913Z [rank1]:E1204 13:13:14.004000 390607 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0805204Z [rank1]:E1204 13:13:14.004000 390607 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0805368Z [rank1]:E1204 13:13:14.004000 390607 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0805655Z [rank1]:E1204 13:13:14.004000 390607 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0805777Z [rank1]:E1204 13:13:14.004000 390607 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0806055Z [rank1]:E1204 13:13:14.004000 390607 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0806201Z [rank1]:E1204 13:13:14.004000 390607 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0806489Z [rank1]:E1204 13:13:14.004000 390607 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0806637Z [rank1]:E1204 13:13:14.004000 390607 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0806912Z [rank1]:E1204 13:13:14.004000 390607 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0807048Z [rank1]:E1204 13:13:14.004000 390607 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0807325Z [rank1]:E1204 13:13:14.004000 390607 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0807476Z [rank1]:E1204 13:13:14.004000 390607 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0807963Z [rank1]:E1204 13:13:14.004000 390607 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 2317352960 and is now 3829399552.
2025-12-04T13:38:32.0808079Z [rank1]:E1204 13:13:14.004000 390607 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0808275Z [rank1]:E1204 13:13:14.004000 390607 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0808631Z [rank1]:E1204 13:13:14.004000 390607 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda
2025-12-04T13:38:32.0808755Z [rank1]:E1204 13:13:14.004000 390607 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0808966Z [rank1]:E1204 13:13:14.004000 390607 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0809132Z [rank1]:E1204 13:13:14.004000 390607 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.0809169Z dist init r=1, world=4
2025-12-04T13:38:32.0809311Z [rank3]:E1204 13:13:14.055000 390609 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0809483Z [rank3]:E1204 13:13:14.055000 390609 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0809806Z [rank3]:E1204 13:13:14.055000 390609 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0809967Z [rank3]:E1204 13:13:14.055000 390609 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0810253Z [rank3]:E1204 13:13:14.055000 390609 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0810380Z [rank3]:E1204 13:13:14.055000 390609 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0810663Z [rank3]:E1204 13:13:14.055000 390609 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0810828Z [rank3]:E1204 13:13:14.055000 390609 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0811106Z [rank3]:E1204 13:13:14.055000 390609 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0811255Z [rank3]:E1204 13:13:14.055000 390609 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0811540Z [rank3]:E1204 13:13:14.055000 390609 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0811678Z [rank3]:E1204 13:13:14.055000 390609 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0811959Z [rank3]:E1204 13:13:14.055000 390609 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0812125Z [rank3]:E1204 13:13:14.055000 390609 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0812605Z [rank3]:E1204 13:13:14.055000 390609 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 2250244096 and is now 3762290688.
2025-12-04T13:38:32.0812721Z [rank3]:E1204 13:13:14.055000 390609 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0812922Z [rank3]:E1204 13:13:14.055000 390609 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0813298Z [rank3]:E1204 13:13:14.055000 390609 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda
2025-12-04T13:38:32.0813411Z [rank3]:E1204 13:13:14.055000 390609 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0813625Z [rank3]:E1204 13:13:14.055000 390609 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0813802Z [rank3]:E1204 13:13:14.055000 390609 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.0813846Z dist init r=3, world=4
2025-12-04T13:38:32.0814184Z [rank0]:[W1204 13:13:14.170152696 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.0814230Z FAILED [57.7880s] [  4%]
2025-12-04T13:38:32.0814233Z 
2025-12-04T13:38:32.0814290Z =================================== FAILURES ===================================
2025-12-04T13:38:32.0814392Z ___ TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda ___
2025-12-04T13:38:32.0814439Z Traceback (most recent call last):
2025-12-04T13:38:32.0814605Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.0814653Z     self._join_processes(fn)
2025-12-04T13:38:32.0814837Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.0814896Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.0815076Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.0815124Z     raise RuntimeError(error)
2025-12-04T13:38:32.0815204Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.0815253Z Traceback (most recent call last):
2025-12-04T13:38:32.0815417Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0815464Z     getattr(self, test_name)()
2025-12-04T13:38:32.0815624Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0815664Z     fn()
2025-12-04T13:38:32.0815817Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0815861Z     method(*args, **kwargs)
2025-12-04T13:38:32.0816014Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0816058Z     method(*args, **kwargs)
2025-12-04T13:38:32.0816219Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0816264Z     with policy():
2025-12-04T13:38:32.0816418Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0816463Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0816816Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 2453667840 and is now 3965714432.
2025-12-04T13:38:32.0816823Z 
2025-12-04T13:38:32.0816898Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0817146Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda
2025-12-04T13:38:32.0817150Z 
2025-12-04T13:38:32.0817239Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0817241Z 
2025-12-04T13:38:32.0817243Z 
2025-12-04T13:38:32.0817320Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.0817409Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.0817646Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6a82e7f2d48533a4.xml -
2025-12-04T13:38:32.0817720Z =========================== short test summary info ============================
2025-12-04T13:38:32.0817972Z FAILED [57.7880s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_none_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.0818019Z Traceback (most recent call last):
2025-12-04T13:38:32.0818187Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0818233Z     getattr(self, test_name)()
2025-12-04T13:38:32.0818395Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0818433Z     fn()
2025-12-04T13:38:32.0818585Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0818641Z     method(*args, **kwargs)
2025-12-04T13:38:32.0818794Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0818836Z     method(*args, **kwargs)
2025-12-04T13:38:32.0818987Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0819029Z     with policy():
2025-12-04T13:38:32.0819182Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0819226Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0819615Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 2453667840 and is now 3965714432.
2025-12-04T13:38:32.0819619Z 
2025-12-04T13:38:32.0819699Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0819932Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda
2025-12-04T13:38:32.0819935Z 
2025-12-04T13:38:32.0820027Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0820113Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.0820179Z ====================== 1 failed, 11 deselected in 57.93s =======================
2025-12-04T13:38:32.0820222Z Got exit code 1
2025-12-04T13:38:32.0820263Z Retrying single test...
2025-12-04T13:38:32.0820456Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-f3d78b10e04e870d.xml
2025-12-04T13:38:32.0820515Z ============================= test session starts ==============================
2025-12-04T13:38:32.0820633Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.0820674Z cachedir: .pytest_cache
2025-12-04T13:38:32.0820850Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.0820897Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.0820943Z configfile: pytest.ini
2025-12-04T13:38:32.0821106Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.0821183Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.0821408Z stepcurrent: skipping 11 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_none_cuda
2025-12-04T13:38:32.0821469Z Running 1 items in this shard
2025-12-04T13:38:32.0821472Z 
2025-12-04T13:38:32.0821778Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_none_cuda I1204 13:13:18.584000 390939 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 391008
2025-12-04T13:38:32.0821937Z I1204 13:13:18.585000 390939 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 391009
2025-12-04T13:38:32.0822089Z I1204 13:13:18.585000 390939 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 391010
2025-12-04T13:38:32.0822245Z I1204 13:13:18.586000 390939 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 391011
2025-12-04T13:38:32.0822835Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0822890Z   _warn_cpu_init()
2025-12-04T13:38:32.0823463Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0823502Z   _warn_cpu_init()
2025-12-04T13:38:32.0823800Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.0823848Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0824437Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0824481Z   _warn_cpu_init()
2025-12-04T13:38:32.0825050Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0825093Z   _warn_cpu_init()
2025-12-04T13:38:32.0825246Z [rank1]:E1204 13:14:14.648000 391009 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0825414Z [rank1]:E1204 13:14:14.648000 391009 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0825708Z [rank1]:E1204 13:14:14.648000 391009 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0825864Z [rank1]:E1204 13:14:14.648000 391009 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0826162Z [rank1]:E1204 13:14:14.648000 391009 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0826289Z [rank1]:E1204 13:14:14.648000 391009 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0826569Z [rank1]:E1204 13:14:14.648000 391009 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0826716Z [rank1]:E1204 13:14:14.648000 391009 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0826997Z [rank1]:E1204 13:14:14.648000 391009 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0827165Z [rank1]:E1204 13:14:14.648000 391009 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0827444Z [rank1]:E1204 13:14:14.648000 391009 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0827585Z [rank1]:E1204 13:14:14.648000 391009 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0827862Z [rank1]:E1204 13:14:14.648000 391009 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0828013Z [rank1]:E1204 13:14:14.648000 391009 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0828498Z [rank1]:E1204 13:14:14.648000 391009 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 2317352960 and is now 3829399552.
2025-12-04T13:38:32.0828629Z [rank1]:E1204 13:14:14.648000 391009 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0828828Z [rank1]:E1204 13:14:14.648000 391009 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0829185Z [rank1]:E1204 13:14:14.648000 391009 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda
2025-12-04T13:38:32.0829304Z [rank1]:E1204 13:14:14.648000 391009 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0829516Z [rank1]:E1204 13:14:14.648000 391009 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0829738Z [rank1]:E1204 13:14:14.648000 391009 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.0829778Z dist init r=1, world=4
2025-12-04T13:38:32.0829919Z [rank3]:E1204 13:14:14.652000 391011 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0830078Z [rank3]:E1204 13:14:14.652000 391011 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0830370Z [rank3]:E1204 13:14:14.652000 391011 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0830542Z [rank3]:E1204 13:14:14.652000 391011 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0830829Z [rank3]:E1204 13:14:14.652000 391011 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0830956Z [rank3]:E1204 13:14:14.652000 391011 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0831234Z [rank3]:E1204 13:14:14.652000 391011 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0831387Z [rank3]:E1204 13:14:14.652000 391011 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0831677Z [rank3]:E1204 13:14:14.652000 391011 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0831828Z [rank3]:E1204 13:14:14.652000 391011 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0832114Z [rank3]:E1204 13:14:14.652000 391011 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0832249Z [rank3]:E1204 13:14:14.652000 391011 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0832529Z [rank3]:E1204 13:14:14.652000 391011 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0832680Z [rank3]:E1204 13:14:14.652000 391011 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0833173Z [rank3]:E1204 13:14:14.652000 391011 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 2250244096 and is now 3762290688.
2025-12-04T13:38:32.0833292Z [rank3]:E1204 13:14:14.652000 391011 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0833489Z [rank3]:E1204 13:14:14.652000 391011 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0833851Z [rank3]:E1204 13:14:14.652000 391011 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda
2025-12-04T13:38:32.0833976Z [rank3]:E1204 13:14:14.652000 391011 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0834192Z [rank3]:E1204 13:14:14.652000 391011 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0834358Z [rank3]:E1204 13:14:14.652000 391011 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.0834401Z dist init r=3, world=4
2025-12-04T13:38:32.0834539Z [rank0]:E1204 13:14:14.655000 391008 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0834714Z [rank0]:E1204 13:14:14.655000 391008 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0835005Z [rank0]:E1204 13:14:14.655000 391008 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0835160Z [rank0]:E1204 13:14:14.655000 391008 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0835448Z [rank0]:E1204 13:14:14.655000 391008 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0835571Z [rank0]:E1204 13:14:14.655000 391008 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0835852Z [rank0]:E1204 13:14:14.655000 391008 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0836011Z [rank0]:E1204 13:14:14.655000 391008 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0836295Z [rank0]:E1204 13:14:14.655000 391008 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0836441Z [rank0]:E1204 13:14:14.655000 391008 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0836723Z [rank0]:E1204 13:14:14.655000 391008 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0836863Z [rank0]:E1204 13:14:14.655000 391008 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0837141Z [rank0]:E1204 13:14:14.655000 391008 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0837302Z [rank0]:E1204 13:14:14.655000 391008 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0837777Z [rank0]:E1204 13:14:14.655000 391008 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 2453667840 and is now 3965714432.
2025-12-04T13:38:32.0837894Z [rank0]:E1204 13:14:14.655000 391008 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0838094Z [rank0]:E1204 13:14:14.655000 391008 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0838462Z [rank0]:E1204 13:14:14.655000 391008 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda
2025-12-04T13:38:32.0838579Z [rank0]:E1204 13:14:14.655000 391008 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0838792Z [rank0]:E1204 13:14:14.655000 391008 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0838969Z [rank0]:E1204 13:14:14.655000 391008 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.0839009Z dist init r=0, world=4
2025-12-04T13:38:32.0839150Z [rank2]:E1204 13:14:14.656000 391010 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0839312Z [rank2]:E1204 13:14:14.656000 391010 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0839643Z [rank2]:E1204 13:14:14.656000 391010 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0839802Z [rank2]:E1204 13:14:14.656000 391010 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0840088Z [rank2]:E1204 13:14:14.656000 391010 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0840228Z [rank2]:E1204 13:14:14.656000 391010 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0840507Z [rank2]:E1204 13:14:14.656000 391010 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0840656Z [rank2]:E1204 13:14:14.656000 391010 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0840933Z [rank2]:E1204 13:14:14.656000 391010 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0841083Z [rank2]:E1204 13:14:14.656000 391010 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0841366Z [rank2]:E1204 13:14:14.656000 391010 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0841517Z [rank2]:E1204 13:14:14.656000 391010 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0841799Z [rank2]:E1204 13:14:14.656000 391010 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0841948Z [rank2]:E1204 13:14:14.656000 391010 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0842428Z [rank2]:E1204 13:14:14.656000 391010 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 2300575744 and is now 3812622336.
2025-12-04T13:38:32.0842557Z [rank2]:E1204 13:14:14.656000 391010 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0842756Z [rank2]:E1204 13:14:14.656000 391010 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0843116Z [rank2]:E1204 13:14:14.656000 391010 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda
2025-12-04T13:38:32.0843229Z [rank2]:E1204 13:14:14.656000 391010 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0843464Z [rank2]:E1204 13:14:14.656000 391010 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0843628Z [rank2]:E1204 13:14:14.656000 391010 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.0843670Z dist init r=2, world=4
2025-12-04T13:38:32.0844006Z [rank0]:[W1204 13:14:14.819319352 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.0844050Z FAILED [57.9883s] [100%]
2025-12-04T13:38:32.0844052Z 
2025-12-04T13:38:32.0844109Z =================================== FAILURES ===================================
2025-12-04T13:38:32.0844213Z ___ TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda ___
2025-12-04T13:38:32.0844270Z Traceback (most recent call last):
2025-12-04T13:38:32.0844437Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.0844484Z     self._join_processes(fn)
2025-12-04T13:38:32.0844659Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.0844718Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.0844897Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.0844944Z     raise RuntimeError(error)
2025-12-04T13:38:32.0845024Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T13:38:32.0845073Z Traceback (most recent call last):
2025-12-04T13:38:32.0845236Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0845285Z     getattr(self, test_name)()
2025-12-04T13:38:32.0845444Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0845483Z     fn()
2025-12-04T13:38:32.0845649Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0845694Z     method(*args, **kwargs)
2025-12-04T13:38:32.0845847Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0845892Z     method(*args, **kwargs)
2025-12-04T13:38:32.0846047Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0846088Z     with policy():
2025-12-04T13:38:32.0846245Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0846293Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0846657Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 2317352960 and is now 3829399552.
2025-12-04T13:38:32.0846664Z 
2025-12-04T13:38:32.0846740Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0846974Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda
2025-12-04T13:38:32.0846977Z 
2025-12-04T13:38:32.0847064Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0847077Z 
2025-12-04T13:38:32.0847142Z Process 2 exited with error code 10 and exception:
2025-12-04T13:38:32.0847188Z Traceback (most recent call last):
2025-12-04T13:38:32.0847355Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0847397Z     getattr(self, test_name)()
2025-12-04T13:38:32.0847560Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0847595Z     fn()
2025-12-04T13:38:32.0847749Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0847789Z     method(*args, **kwargs)
2025-12-04T13:38:32.0847944Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0847984Z     method(*args, **kwargs)
2025-12-04T13:38:32.0848142Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0848191Z     with policy():
2025-12-04T13:38:32.0848347Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0848388Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0848741Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 2300575744 and is now 3812622336.
2025-12-04T13:38:32.0848744Z 
2025-12-04T13:38:32.0848821Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0849050Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda
2025-12-04T13:38:32.0849053Z 
2025-12-04T13:38:32.0849144Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0849146Z 
2025-12-04T13:38:32.0849205Z Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.0849254Z Traceback (most recent call last):
2025-12-04T13:38:32.0849418Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0849475Z     getattr(self, test_name)()
2025-12-04T13:38:32.0849676Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0849715Z     fn()
2025-12-04T13:38:32.0849866Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0849911Z     method(*args, **kwargs)
2025-12-04T13:38:32.0850060Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0850107Z     method(*args, **kwargs)
2025-12-04T13:38:32.0850257Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0850299Z     with policy():
2025-12-04T13:38:32.0850471Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0850515Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0850870Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 2250244096 and is now 3762290688.
2025-12-04T13:38:32.0850873Z 
2025-12-04T13:38:32.0850948Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0851197Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda
2025-12-04T13:38:32.0851201Z 
2025-12-04T13:38:32.0851286Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0851288Z 
2025-12-04T13:38:32.0851290Z 
2025-12-04T13:38:32.0851371Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.0851460Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.0851702Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-f3d78b10e04e870d.xml -
2025-12-04T13:38:32.0851766Z =========================== short test summary info ============================
2025-12-04T13:38:32.0852014Z FAILED [57.9883s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_none_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T13:38:32.0852078Z Traceback (most recent call last):
2025-12-04T13:38:32.0852245Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0852290Z     getattr(self, test_name)()
2025-12-04T13:38:32.0852453Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0852491Z     fn()
2025-12-04T13:38:32.0852643Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0852687Z     method(*args, **kwargs)
2025-12-04T13:38:32.0852839Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0852882Z     method(*args, **kwargs)
2025-12-04T13:38:32.0853035Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0853076Z     with policy():
2025-12-04T13:38:32.0853229Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0853274Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0853637Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 2317352960 and is now 3829399552.
2025-12-04T13:38:32.0853643Z 
2025-12-04T13:38:32.0853717Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0853951Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda
2025-12-04T13:38:32.0853954Z 
2025-12-04T13:38:32.0854043Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0854045Z 
2025-12-04T13:38:32.0854108Z Process 2 exited with error code 10 and exception:
2025-12-04T13:38:32.0854153Z Traceback (most recent call last):
2025-12-04T13:38:32.0854332Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0854376Z     getattr(self, test_name)()
2025-12-04T13:38:32.0854540Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0854575Z     fn()
2025-12-04T13:38:32.0854728Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0854768Z     method(*args, **kwargs)
2025-12-04T13:38:32.0854921Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0854972Z     method(*args, **kwargs)
2025-12-04T13:38:32.0855126Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0855164Z     with policy():
2025-12-04T13:38:32.0855321Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0855363Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0855718Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 2300575744 and is now 3812622336.
2025-12-04T13:38:32.0855721Z 
2025-12-04T13:38:32.0855797Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0856028Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda
2025-12-04T13:38:32.0856040Z 
2025-12-04T13:38:32.0856131Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0856133Z 
2025-12-04T13:38:32.0856192Z Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.0856242Z Traceback (most recent call last):
2025-12-04T13:38:32.0856406Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0856450Z     getattr(self, test_name)()
2025-12-04T13:38:32.0856610Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0856649Z     fn()
2025-12-04T13:38:32.0856800Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0856845Z     method(*args, **kwargs)
2025-12-04T13:38:32.0856996Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0857039Z     method(*args, **kwargs)
2025-12-04T13:38:32.0857191Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0857232Z     with policy():
2025-12-04T13:38:32.0857396Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0857438Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0857793Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 2250244096 and is now 3762290688.
2025-12-04T13:38:32.0857796Z 
2025-12-04T13:38:32.0857870Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0858101Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda
2025-12-04T13:38:32.0858103Z 
2025-12-04T13:38:32.0858205Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0858275Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.0858339Z ====================== 1 failed, 32 deselected in 58.15s =======================
2025-12-04T13:38:32.0858380Z Got exit code 1
2025-12-04T13:38:32.0858420Z Retrying single test...
2025-12-04T13:38:32.0858616Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-7fe17ceee5b62d33.xml
2025-12-04T13:38:32.0858675Z ============================= test session starts ==============================
2025-12-04T13:38:32.0858804Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.0858846Z cachedir: .pytest_cache
2025-12-04T13:38:32.0859008Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.0859056Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.0859100Z configfile: pytest.ini
2025-12-04T13:38:32.0859267Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.0859341Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.0859606Z stepcurrent: skipping 11 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_none_cuda
2025-12-04T13:38:32.0859650Z Running 1 items in this shard
2025-12-04T13:38:32.0859653Z 
2025-12-04T13:38:32.0859978Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_none_cuda I1204 13:14:18.922000 391341 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 391410
2025-12-04T13:38:32.0860136Z I1204 13:14:18.923000 391341 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 391411
2025-12-04T13:38:32.0864606Z I1204 13:14:18.923000 391341 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 391412
2025-12-04T13:38:32.0864770Z I1204 13:14:18.924000 391341 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 391413
2025-12-04T13:38:32.0865359Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0865400Z   _warn_cpu_init()
2025-12-04T13:38:32.0866003Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0866046Z   _warn_cpu_init()
2025-12-04T13:38:32.0866617Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0866664Z   _warn_cpu_init()
2025-12-04T13:38:32.0866973Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.0867021Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0867594Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0867648Z   _warn_cpu_init()
2025-12-04T13:38:32.0867799Z [rank2]:E1204 13:15:14.743000 391412 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0867965Z [rank2]:E1204 13:15:14.743000 391412 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0868260Z [rank2]:E1204 13:15:14.743000 391412 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0868416Z [rank2]:E1204 13:15:14.743000 391412 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0868705Z [rank2]:E1204 13:15:14.743000 391412 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0868847Z [rank2]:E1204 13:15:14.743000 391412 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0869128Z [rank2]:E1204 13:15:14.743000 391412 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0869279Z [rank2]:E1204 13:15:14.743000 391412 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0869556Z [rank2]:E1204 13:15:14.743000 391412 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0869737Z [rank2]:E1204 13:15:14.743000 391412 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0870017Z [rank2]:E1204 13:15:14.743000 391412 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0870174Z [rank2]:E1204 13:15:14.743000 391412 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0870455Z [rank2]:E1204 13:15:14.743000 391412 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0870604Z [rank2]:E1204 13:15:14.743000 391412 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0871092Z [rank2]:E1204 13:15:14.743000 391412 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 2300575744 and is now 3812622336.
2025-12-04T13:38:32.0871223Z [rank2]:E1204 13:15:14.743000 391412 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0871424Z [rank2]:E1204 13:15:14.743000 391412 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0871783Z [rank2]:E1204 13:15:14.743000 391412 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda
2025-12-04T13:38:32.0871913Z [rank2]:E1204 13:15:14.743000 391412 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0872131Z [rank2]:E1204 13:15:14.743000 391412 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0872298Z [rank2]:E1204 13:15:14.743000 391412 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.0872341Z dist init r=2, world=4
2025-12-04T13:38:32.0872481Z [rank3]:E1204 13:15:14.751000 391413 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0872644Z [rank3]:E1204 13:15:14.751000 391413 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0872932Z [rank3]:E1204 13:15:14.751000 391413 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0873104Z [rank3]:E1204 13:15:14.751000 391413 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0873393Z [rank3]:E1204 13:15:14.751000 391413 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0873520Z [rank3]:E1204 13:15:14.751000 391413 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0873800Z [rank3]:E1204 13:15:14.751000 391413 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0873949Z [rank3]:E1204 13:15:14.751000 391413 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0874231Z [rank3]:E1204 13:15:14.751000 391413 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0874378Z [rank3]:E1204 13:15:14.751000 391413 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0874670Z [rank3]:E1204 13:15:14.751000 391413 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0874808Z [rank3]:E1204 13:15:14.751000 391413 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0875087Z [rank3]:E1204 13:15:14.751000 391413 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0875238Z [rank3]:E1204 13:15:14.751000 391413 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0875728Z [rank3]:E1204 13:15:14.751000 391413 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 2250244096 and is now 3762290688.
2025-12-04T13:38:32.0875847Z [rank3]:E1204 13:15:14.751000 391413 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0876043Z [rank3]:E1204 13:15:14.751000 391413 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0876417Z [rank3]:E1204 13:15:14.751000 391413 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda
2025-12-04T13:38:32.0876535Z [rank3]:E1204 13:15:14.751000 391413 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0876750Z [rank3]:E1204 13:15:14.751000 391413 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0876918Z [rank3]:E1204 13:15:14.751000 391413 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.0876958Z dist init r=3, world=4
2025-12-04T13:38:32.0877098Z [rank1]:E1204 13:15:14.790000 391411 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0877262Z [rank1]:E1204 13:15:14.790000 391411 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0877565Z [rank1]:E1204 13:15:14.790000 391411 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0877719Z [rank1]:E1204 13:15:14.790000 391411 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0878008Z [rank1]:E1204 13:15:14.790000 391411 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0878132Z [rank1]:E1204 13:15:14.790000 391411 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0878411Z [rank1]:E1204 13:15:14.790000 391411 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0878562Z [rank1]:E1204 13:15:14.790000 391411 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0878861Z [rank1]:E1204 13:15:14.790000 391411 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0879011Z [rank1]:E1204 13:15:14.790000 391411 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0879286Z [rank1]:E1204 13:15:14.790000 391411 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0879428Z [rank1]:E1204 13:15:14.790000 391411 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0879756Z [rank1]:E1204 13:15:14.790000 391411 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0879908Z [rank1]:E1204 13:15:14.790000 391411 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0880388Z [rank1]:E1204 13:15:14.790000 391411 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 2317352960 and is now 3829399552.
2025-12-04T13:38:32.0880515Z [rank1]:E1204 13:15:14.790000 391411 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0880712Z [rank1]:E1204 13:15:14.790000 391411 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0881069Z [rank1]:E1204 13:15:14.790000 391411 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda
2025-12-04T13:38:32.0881186Z [rank1]:E1204 13:15:14.790000 391411 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0881400Z [rank1]:E1204 13:15:14.790000 391411 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0881567Z [rank1]:E1204 13:15:14.790000 391411 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.0881622Z dist init r=1, world=4
2025-12-04T13:38:32.0881759Z [rank0]:E1204 13:15:14.796000 391410 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0881922Z [rank0]:E1204 13:15:14.796000 391410 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0882209Z [rank0]:E1204 13:15:14.796000 391410 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0882366Z [rank0]:E1204 13:15:14.796000 391410 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0882651Z [rank0]:E1204 13:15:14.796000 391410 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0882779Z [rank0]:E1204 13:15:14.796000 391410 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0883075Z [rank0]:E1204 13:15:14.796000 391410 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0883225Z [rank0]:E1204 13:15:14.796000 391410 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0883503Z [rank0]:E1204 13:15:14.796000 391410 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0883651Z [rank0]:E1204 13:15:14.796000 391410 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0883929Z [rank0]:E1204 13:15:14.796000 391410 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0884076Z [rank0]:E1204 13:15:14.796000 391410 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0884365Z [rank0]:E1204 13:15:14.796000 391410 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0884514Z [rank0]:E1204 13:15:14.796000 391410 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0884993Z [rank0]:E1204 13:15:14.796000 391410 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 2453667840 and is now 3965714432.
2025-12-04T13:38:32.0885121Z [rank0]:E1204 13:15:14.796000 391410 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0885318Z [rank0]:E1204 13:15:14.796000 391410 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0885675Z [rank0]:E1204 13:15:14.796000 391410 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda
2025-12-04T13:38:32.0885787Z [rank0]:E1204 13:15:14.796000 391410 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0886012Z [rank0]:E1204 13:15:14.796000 391410 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0886176Z [rank0]:E1204 13:15:14.796000 391410 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.0886217Z dist init r=0, world=4
2025-12-04T13:38:32.0886560Z [rank0]:[W1204 13:15:15.060813889 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.0886600Z FAILED [57.7865s] [100%]
2025-12-04T13:38:32.0886603Z 
2025-12-04T13:38:32.0886663Z =================================== FAILURES ===================================
2025-12-04T13:38:32.0886762Z ___ TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda ___
2025-12-04T13:38:32.0886813Z Traceback (most recent call last):
2025-12-04T13:38:32.0886979Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.0887027Z     self._join_processes(fn)
2025-12-04T13:38:32.0887202Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.0887268Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.0887447Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.0887494Z     raise RuntimeError(error)
2025-12-04T13:38:32.0887573Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.0887622Z Traceback (most recent call last):
2025-12-04T13:38:32.0887784Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0887830Z     getattr(self, test_name)()
2025-12-04T13:38:32.0887988Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0888025Z     fn()
2025-12-04T13:38:32.0888187Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0888231Z     method(*args, **kwargs)
2025-12-04T13:38:32.0888381Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0888425Z     method(*args, **kwargs)
2025-12-04T13:38:32.0888577Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0888614Z     with policy():
2025-12-04T13:38:32.0888782Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0888824Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0889180Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 2250244096 and is now 3762290688.
2025-12-04T13:38:32.0889182Z 
2025-12-04T13:38:32.0889259Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0889493Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda
2025-12-04T13:38:32.0889495Z 
2025-12-04T13:38:32.0889612Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0889614Z 
2025-12-04T13:38:32.0889617Z 
2025-12-04T13:38:32.0889697Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.0889803Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.0890039Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-7fe17ceee5b62d33.xml -
2025-12-04T13:38:32.0890102Z =========================== short test summary info ============================
2025-12-04T13:38:32.0890353Z FAILED [57.7865s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_none_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.0890401Z Traceback (most recent call last):
2025-12-04T13:38:32.0890566Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0890614Z     getattr(self, test_name)()
2025-12-04T13:38:32.0890776Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0890813Z     fn()
2025-12-04T13:38:32.0890968Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0891011Z     method(*args, **kwargs)
2025-12-04T13:38:32.0891179Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0891222Z     method(*args, **kwargs)
2025-12-04T13:38:32.0891373Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0891412Z     with policy():
2025-12-04T13:38:32.0891565Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0891608Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0891959Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 2250244096 and is now 3762290688.
2025-12-04T13:38:32.0891965Z 
2025-12-04T13:38:32.0892053Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0892289Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_none_cuda
2025-12-04T13:38:32.0892291Z 
2025-12-04T13:38:32.0892377Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0892444Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.0892506Z ====================== 1 failed, 32 deselected in 57.95s =======================
2025-12-04T13:38:32.0892560Z Got exit code 1
2025-12-04T13:38:32.0892740Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_none_cuda
2025-12-04T13:38:32.0892871Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T13:38:32.0893060Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2db6c407e5a97404.xml
2025-12-04T13:38:32.0893123Z ============================= test session starts ==============================
2025-12-04T13:38:32.0893237Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.0893281Z cachedir: .pytest_cache
2025-12-04T13:38:32.0893439Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.0893490Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.0893533Z configfile: pytest.ini
2025-12-04T13:38:32.0893715Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.0893791Z collecting ... collected 60 items / 12 deselected / 48 selected
2025-12-04T13:38:32.0893844Z stepcurrent: skipping 12 already run items.
2025-12-04T13:38:32.0893892Z Running 21 items in this shard
2025-12-04T13:38:32.0893894Z 
2025-12-04T13:38:32.0894215Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda I1204 13:15:19.545000 391743 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 391812
2025-12-04T13:38:32.0894374Z I1204 13:15:19.546000 391743 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 391813
2025-12-04T13:38:32.0894527Z I1204 13:15:19.546000 391743 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 391814
2025-12-04T13:38:32.0894682Z I1204 13:15:19.547000 391743 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 391815
2025-12-04T13:38:32.0895275Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0895315Z   _warn_cpu_init()
2025-12-04T13:38:32.0895891Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0895930Z   _warn_cpu_init()
2025-12-04T13:38:32.0896239Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.0896281Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0896858Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0896909Z   _warn_cpu_init()
2025-12-04T13:38:32.0897483Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0897522Z   _warn_cpu_init()
2025-12-04T13:38:32.0897667Z [rank0]:E1204 13:16:15.310000 391812 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0897831Z [rank0]:E1204 13:16:15.310000 391812 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0898123Z [rank0]:E1204 13:16:15.310000 391812 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0898296Z [rank0]:E1204 13:16:15.310000 391812 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0898585Z [rank0]:E1204 13:16:15.310000 391812 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0898713Z [rank0]:E1204 13:16:15.310000 391812 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0898992Z [rank0]:E1204 13:16:15.310000 391812 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0899142Z [rank0]:E1204 13:16:15.310000 391812 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0899429Z [rank0]:E1204 13:16:15.310000 391812 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0899628Z [rank0]:E1204 13:16:15.310000 391812 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0899907Z [rank0]:E1204 13:16:15.310000 391812 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0900045Z [rank0]:E1204 13:16:15.310000 391812 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0900327Z [rank0]:E1204 13:16:15.310000 391812 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0900479Z [rank0]:E1204 13:16:15.310000 391812 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0900981Z [rank0]:E1204 13:16:15.310000 391812 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 2453667840 and is now 3965714432.
2025-12-04T13:38:32.0901100Z [rank0]:E1204 13:16:15.310000 391812 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0901296Z [rank0]:E1204 13:16:15.310000 391812 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0901686Z [rank0]:E1204 13:16:15.310000 391812 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.0901803Z [rank0]:E1204 13:16:15.310000 391812 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0902015Z [rank0]:E1204 13:16:15.310000 391812 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0902183Z [rank0]:E1204 13:16:15.310000 391812 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.0902222Z dist init r=0, world=4
2025-12-04T13:38:32.0902364Z [rank3]:E1204 13:16:15.363000 391815 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0902537Z [rank3]:E1204 13:16:15.363000 391815 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0902826Z [rank3]:E1204 13:16:15.363000 391815 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0902979Z [rank3]:E1204 13:16:15.363000 391815 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0903267Z [rank3]:E1204 13:16:15.363000 391815 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0903395Z [rank3]:E1204 13:16:15.363000 391815 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0903671Z [rank3]:E1204 13:16:15.363000 391815 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0903822Z [rank3]:E1204 13:16:15.363000 391815 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0904112Z [rank3]:E1204 13:16:15.363000 391815 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0904261Z [rank3]:E1204 13:16:15.363000 391815 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0904536Z [rank3]:E1204 13:16:15.363000 391815 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0904679Z [rank3]:E1204 13:16:15.363000 391815 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0904972Z [rank3]:E1204 13:16:15.363000 391815 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0905119Z [rank3]:E1204 13:16:15.363000 391815 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0905607Z [rank3]:E1204 13:16:15.363000 391815 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 2250244096 and is now 3762290688.
2025-12-04T13:38:32.0905733Z [rank3]:E1204 13:16:15.363000 391815 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0905933Z [rank3]:E1204 13:16:15.363000 391815 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0906304Z [rank3]:E1204 13:16:15.363000 391815 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.0906421Z [rank3]:E1204 13:16:15.363000 391815 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0906637Z [rank3]:E1204 13:16:15.363000 391815 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0906813Z [rank3]:E1204 13:16:15.363000 391815 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.0906854Z dist init r=3, world=4
2025-12-04T13:38:32.0906992Z [rank1]:E1204 13:16:15.382000 391813 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0907155Z [rank1]:E1204 13:16:15.382000 391813 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0907441Z [rank1]:E1204 13:16:15.382000 391813 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0907596Z [rank1]:E1204 13:16:15.382000 391813 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0907882Z [rank1]:E1204 13:16:15.382000 391813 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0908009Z [rank1]:E1204 13:16:15.382000 391813 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0908296Z [rank1]:E1204 13:16:15.382000 391813 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0908444Z [rank1]:E1204 13:16:15.382000 391813 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0908725Z [rank1]:E1204 13:16:15.382000 391813 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0908875Z [rank1]:E1204 13:16:15.382000 391813 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0909163Z [rank1]:E1204 13:16:15.382000 391813 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0909300Z [rank1]:E1204 13:16:15.382000 391813 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0909622Z [rank1]:E1204 13:16:15.382000 391813 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0909773Z [rank1]:E1204 13:16:15.382000 391813 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0910283Z [rank1]:E1204 13:16:15.382000 391813 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 2317352960 and is now 3829399552.
2025-12-04T13:38:32.0910402Z [rank1]:E1204 13:16:15.382000 391813 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0910597Z [rank1]:E1204 13:16:15.382000 391813 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0910971Z [rank1]:E1204 13:16:15.382000 391813 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.0911099Z [rank1]:E1204 13:16:15.382000 391813 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0911314Z [rank1]:E1204 13:16:15.382000 391813 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0911482Z [rank1]:E1204 13:16:15.382000 391813 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.0911520Z dist init r=1, world=4
2025-12-04T13:38:32.0911658Z [rank2]:E1204 13:16:15.388000 391814 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0911818Z [rank2]:E1204 13:16:15.388000 391814 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0912107Z [rank2]:E1204 13:16:15.388000 391814 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0912263Z [rank2]:E1204 13:16:15.388000 391814 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0912566Z [rank2]:E1204 13:16:15.388000 391814 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0912690Z [rank2]:E1204 13:16:15.388000 391814 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0912969Z [rank2]:E1204 13:16:15.388000 391814 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0913119Z [rank2]:E1204 13:16:15.388000 391814 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0913410Z [rank2]:E1204 13:16:15.388000 391814 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0913561Z [rank2]:E1204 13:16:15.388000 391814 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0913837Z [rank2]:E1204 13:16:15.388000 391814 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0913974Z [rank2]:E1204 13:16:15.388000 391814 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0914252Z [rank2]:E1204 13:16:15.388000 391814 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0914415Z [rank2]:E1204 13:16:15.388000 391814 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0914903Z [rank2]:E1204 13:16:15.388000 391814 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 2300575744 and is now 3812622336.
2025-12-04T13:38:32.0915017Z [rank2]:E1204 13:16:15.388000 391814 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0915214Z [rank2]:E1204 13:16:15.388000 391814 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0915593Z [rank2]:E1204 13:16:15.388000 391814 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.0915710Z [rank2]:E1204 13:16:15.388000 391814 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0915921Z [rank2]:E1204 13:16:15.388000 391814 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0916087Z [rank2]:E1204 13:16:15.388000 391814 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.0916127Z dist init r=2, world=4
2025-12-04T13:38:32.0916463Z [rank0]:[W1204 13:16:15.495789825 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.0916506Z FAILED [57.7779s] [  4%]
2025-12-04T13:38:32.0916508Z 
2025-12-04T13:38:32.0916565Z =================================== FAILURES ===================================
2025-12-04T13:38:32.0916687Z _ TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda _
2025-12-04T13:38:32.0916734Z Traceback (most recent call last):
2025-12-04T13:38:32.0916901Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.0916945Z     self._join_processes(fn)
2025-12-04T13:38:32.0917119Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.0917174Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.0917356Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.0917400Z     raise RuntimeError(error)
2025-12-04T13:38:32.0917492Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.0917538Z Traceback (most recent call last):
2025-12-04T13:38:32.0917702Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0917744Z     getattr(self, test_name)()
2025-12-04T13:38:32.0917907Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0917941Z     fn()
2025-12-04T13:38:32.0918097Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0918152Z     method(*args, **kwargs)
2025-12-04T13:38:32.0918306Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0918348Z     method(*args, **kwargs)
2025-12-04T13:38:32.0918500Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0918540Z     with policy():
2025-12-04T13:38:32.0918693Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0918736Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0919099Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 2453667840 and is now 3965714432.
2025-12-04T13:38:32.0919103Z 
2025-12-04T13:38:32.0919191Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0919434Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.0919436Z 
2025-12-04T13:38:32.0919527Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0919529Z 
2025-12-04T13:38:32.0919635Z Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.0919680Z Traceback (most recent call last):
2025-12-04T13:38:32.0919846Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0919888Z     getattr(self, test_name)()
2025-12-04T13:38:32.0920049Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0920084Z     fn()
2025-12-04T13:38:32.0920238Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0920278Z     method(*args, **kwargs)
2025-12-04T13:38:32.0920433Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0920473Z     method(*args, **kwargs)
2025-12-04T13:38:32.0920639Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0920676Z     with policy():
2025-12-04T13:38:32.0920830Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0920870Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0921236Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 2250244096 and is now 3762290688.
2025-12-04T13:38:32.0921240Z 
2025-12-04T13:38:32.0921314Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0921571Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.0921574Z 
2025-12-04T13:38:32.0921663Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0921665Z 
2025-12-04T13:38:32.0921667Z 
2025-12-04T13:38:32.0921743Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.0921833Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.0922068Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2db6c407e5a97404.xml -
2025-12-04T13:38:32.0922145Z =========================== short test summary info ============================
2025-12-04T13:38:32.0922403Z FAILED [57.7779s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.0922453Z Traceback (most recent call last):
2025-12-04T13:38:32.0922618Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0922663Z     getattr(self, test_name)()
2025-12-04T13:38:32.0922827Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0922864Z     fn()
2025-12-04T13:38:32.0923015Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0923073Z     method(*args, **kwargs)
2025-12-04T13:38:32.0923230Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0923270Z     method(*args, **kwargs)
2025-12-04T13:38:32.0923426Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0923464Z     with policy():
2025-12-04T13:38:32.0923621Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0923662Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0924030Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 2453667840 and is now 3965714432.
2025-12-04T13:38:32.0924034Z 
2025-12-04T13:38:32.0924108Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0924352Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.0924355Z 
2025-12-04T13:38:32.0924443Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0924455Z 
2025-12-04T13:38:32.0924519Z Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.0924564Z Traceback (most recent call last):
2025-12-04T13:38:32.0924730Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0924775Z     getattr(self, test_name)()
2025-12-04T13:38:32.0924936Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0924975Z     fn()
2025-12-04T13:38:32.0925127Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0925169Z     method(*args, **kwargs)
2025-12-04T13:38:32.0925338Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0925381Z     method(*args, **kwargs)
2025-12-04T13:38:32.0925533Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0925573Z     with policy():
2025-12-04T13:38:32.0925726Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0925770Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0926131Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 2250244096 and is now 3762290688.
2025-12-04T13:38:32.0926145Z 
2025-12-04T13:38:32.0926222Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0926463Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.0926470Z 
2025-12-04T13:38:32.0926557Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0926626Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.0926688Z ====================== 1 failed, 12 deselected in 57.92s =======================
2025-12-04T13:38:32.0926729Z Got exit code 1
2025-12-04T13:38:32.0926770Z Retrying single test...
2025-12-04T13:38:32.0926965Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-f1234f2e29248a94.xml
2025-12-04T13:38:32.0927036Z ============================= test session starts ==============================
2025-12-04T13:38:32.0927153Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.0927195Z cachedir: .pytest_cache
2025-12-04T13:38:32.0927360Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.0927407Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.0927454Z configfile: pytest.ini
2025-12-04T13:38:32.0927618Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.0927695Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.0927933Z stepcurrent: skipping 12 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.0927981Z Running 1 items in this shard
2025-12-04T13:38:32.0927983Z 
2025-12-04T13:38:32.0928309Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda I1204 13:16:20.163000 392145 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 392214
2025-12-04T13:38:32.0928470Z I1204 13:16:20.163000 392145 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 392215
2025-12-04T13:38:32.0928623Z I1204 13:16:20.164000 392145 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 392216
2025-12-04T13:38:32.0928774Z I1204 13:16:20.164000 392145 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 392217
2025-12-04T13:38:32.0929372Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0929415Z   _warn_cpu_init()
2025-12-04T13:38:32.0930026Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0930083Z   _warn_cpu_init()
2025-12-04T13:38:32.0930653Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0930694Z   _warn_cpu_init()
2025-12-04T13:38:32.0930985Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.0931032Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0931606Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0931662Z   _warn_cpu_init()
2025-12-04T13:38:32.0931809Z [rank3]:E1204 13:17:16.104000 392217 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0931971Z [rank3]:E1204 13:17:16.104000 392217 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0932268Z [rank3]:E1204 13:17:16.104000 392217 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0932425Z [rank3]:E1204 13:17:16.104000 392217 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0932714Z [rank3]:E1204 13:17:16.104000 392217 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0932853Z [rank3]:E1204 13:17:16.104000 392217 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0933134Z [rank3]:E1204 13:17:16.104000 392217 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0933287Z [rank3]:E1204 13:17:16.104000 392217 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0933568Z [rank3]:E1204 13:17:16.104000 392217 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0933733Z [rank3]:E1204 13:17:16.104000 392217 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0934008Z [rank3]:E1204 13:17:16.104000 392217 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0934150Z [rank3]:E1204 13:17:16.104000 392217 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0934426Z [rank3]:E1204 13:17:16.104000 392217 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0934591Z [rank3]:E1204 13:17:16.104000 392217 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0935085Z [rank3]:E1204 13:17:16.104000 392217 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 2250244096 and is now 3762290688.
2025-12-04T13:38:32.0935200Z [rank3]:E1204 13:17:16.104000 392217 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0935398Z [rank3]:E1204 13:17:16.104000 392217 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0935768Z [rank3]:E1204 13:17:16.104000 392217 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.0935897Z [rank3]:E1204 13:17:16.104000 392217 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0936110Z [rank3]:E1204 13:17:16.104000 392217 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0936278Z [rank3]:E1204 13:17:16.104000 392217 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.0936318Z dist init r=3, world=4
2025-12-04T13:38:32.0936459Z [rank0]:E1204 13:17:16.133000 392214 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0936621Z [rank0]:E1204 13:17:16.133000 392214 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0936911Z [rank0]:E1204 13:17:16.133000 392214 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0937082Z [rank0]:E1204 13:17:16.133000 392214 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0937368Z [rank0]:E1204 13:17:16.133000 392214 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0937497Z [rank0]:E1204 13:17:16.133000 392214 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0937772Z [rank0]:E1204 13:17:16.133000 392214 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0937926Z [rank0]:E1204 13:17:16.133000 392214 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0938215Z [rank0]:E1204 13:17:16.133000 392214 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0938362Z [rank0]:E1204 13:17:16.133000 392214 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0938640Z [rank0]:E1204 13:17:16.133000 392214 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0938787Z [rank0]:E1204 13:17:16.133000 392214 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0939070Z [rank0]:E1204 13:17:16.133000 392214 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0939221Z [rank0]:E1204 13:17:16.133000 392214 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0939745Z [rank0]:E1204 13:17:16.133000 392214 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 2453667840 and is now 3965714432.
2025-12-04T13:38:32.0939865Z [rank0]:E1204 13:17:16.133000 392214 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0940077Z [rank0]:E1204 13:17:16.133000 392214 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0940450Z [rank0]:E1204 13:17:16.133000 392214 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.0940564Z [rank0]:E1204 13:17:16.133000 392214 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0940777Z [rank0]:E1204 13:17:16.133000 392214 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0940940Z [rank0]:E1204 13:17:16.133000 392214 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.0940984Z dist init r=0, world=4
2025-12-04T13:38:32.0941121Z [rank1]:E1204 13:17:16.136000 392215 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0941284Z [rank1]:E1204 13:17:16.136000 392215 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0941596Z [rank1]:E1204 13:17:16.136000 392215 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0941752Z [rank1]:E1204 13:17:16.136000 392215 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0942040Z [rank1]:E1204 13:17:16.136000 392215 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0942165Z [rank1]:E1204 13:17:16.136000 392215 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0942457Z [rank1]:E1204 13:17:16.136000 392215 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0942605Z [rank1]:E1204 13:17:16.136000 392215 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0942887Z [rank1]:E1204 13:17:16.136000 392215 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0943038Z [rank1]:E1204 13:17:16.136000 392215 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0943329Z [rank1]:E1204 13:17:16.136000 392215 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0943469Z [rank1]:E1204 13:17:16.136000 392215 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0943749Z [rank1]:E1204 13:17:16.136000 392215 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0943901Z [rank1]:E1204 13:17:16.136000 392215 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0944390Z [rank1]:E1204 13:17:16.136000 392215 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 2317352960 and is now 3829399552.
2025-12-04T13:38:32.0944520Z [rank1]:E1204 13:17:16.136000 392215 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0944719Z [rank1]:E1204 13:17:16.136000 392215 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0945086Z [rank1]:E1204 13:17:16.136000 392215 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.0945203Z [rank1]:E1204 13:17:16.136000 392215 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0945417Z [rank1]:E1204 13:17:16.136000 392215 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0945585Z [rank1]:E1204 13:17:16.136000 392215 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.0945624Z dist init r=1, world=4
2025-12-04T13:38:32.0945774Z [rank2]:E1204 13:17:16.161000 392216 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0945934Z [rank2]:E1204 13:17:16.161000 392216 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0946224Z [rank2]:E1204 13:17:16.161000 392216 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0946384Z [rank2]:E1204 13:17:16.161000 392216 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0946680Z [rank2]:E1204 13:17:16.161000 392216 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0946809Z [rank2]:E1204 13:17:16.161000 392216 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0947087Z [rank2]:E1204 13:17:16.161000 392216 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0947238Z [rank2]:E1204 13:17:16.161000 392216 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0947525Z [rank2]:E1204 13:17:16.161000 392216 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0947681Z [rank2]:E1204 13:17:16.161000 392216 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0947961Z [rank2]:E1204 13:17:16.161000 392216 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0948098Z [rank2]:E1204 13:17:16.161000 392216 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0948379Z [rank2]:E1204 13:17:16.161000 392216 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0948529Z [rank2]:E1204 13:17:16.161000 392216 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0949029Z [rank2]:E1204 13:17:16.161000 392216 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 2300575744 and is now 3812622336.
2025-12-04T13:38:32.0949146Z [rank2]:E1204 13:17:16.161000 392216 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0949342Z [rank2]:E1204 13:17:16.161000 392216 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0949765Z [rank2]:E1204 13:17:16.161000 392216 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.0949880Z [rank2]:E1204 13:17:16.161000 392216 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0950109Z [rank2]:E1204 13:17:16.161000 392216 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0950274Z [rank2]:E1204 13:17:16.161000 392216 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.0950316Z dist init r=2, world=4
2025-12-04T13:38:32.0950651Z [rank0]:[W1204 13:17:16.378939219 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.0950698Z FAILED [57.7858s] [100%]
2025-12-04T13:38:32.0950700Z 
2025-12-04T13:38:32.0950760Z =================================== FAILURES ===================================
2025-12-04T13:38:32.0950868Z _ TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda _
2025-12-04T13:38:32.0950932Z Traceback (most recent call last):
2025-12-04T13:38:32.0951099Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.0951146Z     self._join_processes(fn)
2025-12-04T13:38:32.0951320Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.0951378Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.0951559Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.0951620Z     raise RuntimeError(error)
2025-12-04T13:38:32.0951699Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.0951748Z Traceback (most recent call last):
2025-12-04T13:38:32.0951911Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0951957Z     getattr(self, test_name)()
2025-12-04T13:38:32.0952118Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0952156Z     fn()
2025-12-04T13:38:32.0952314Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0952359Z     method(*args, **kwargs)
2025-12-04T13:38:32.0952510Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0952559Z     method(*args, **kwargs)
2025-12-04T13:38:32.0952725Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0952766Z     with policy():
2025-12-04T13:38:32.0952923Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0952968Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0953336Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 2250244096 and is now 3762290688.
2025-12-04T13:38:32.0953338Z 
2025-12-04T13:38:32.0953412Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0953659Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.0953663Z 
2025-12-04T13:38:32.0953750Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0953753Z 
2025-12-04T13:38:32.0953754Z 
2025-12-04T13:38:32.0953834Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.0953933Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.0954168Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-f1234f2e29248a94.xml -
2025-12-04T13:38:32.0954229Z =========================== short test summary info ============================
2025-12-04T13:38:32.0954492Z FAILED [57.7858s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.0954543Z Traceback (most recent call last):
2025-12-04T13:38:32.0954708Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0954754Z     getattr(self, test_name)()
2025-12-04T13:38:32.0954926Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0954966Z     fn()
2025-12-04T13:38:32.0955120Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0955165Z     method(*args, **kwargs)
2025-12-04T13:38:32.0955315Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0955358Z     method(*args, **kwargs)
2025-12-04T13:38:32.0955510Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0955567Z     with policy():
2025-12-04T13:38:32.0955720Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0955764Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0956130Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 2250244096 and is now 3762290688.
2025-12-04T13:38:32.0956132Z 
2025-12-04T13:38:32.0956210Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0956454Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.0956462Z 
2025-12-04T13:38:32.0956566Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0956633Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.0956696Z ====================== 1 failed, 32 deselected in 57.93s =======================
2025-12-04T13:38:32.0956738Z Got exit code 1
2025-12-04T13:38:32.0956779Z Retrying single test...
2025-12-04T13:38:32.0956973Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3cb93b52e3d78be7.xml
2025-12-04T13:38:32.0957031Z ============================= test session starts ==============================
2025-12-04T13:38:32.0957147Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.0957189Z cachedir: .pytest_cache
2025-12-04T13:38:32.0957352Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.0957400Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.0957444Z configfile: pytest.ini
2025-12-04T13:38:32.0957608Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.0957686Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.0957934Z stepcurrent: skipping 12 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.0957981Z Running 1 items in this shard
2025-12-04T13:38:32.0957983Z 
2025-12-04T13:38:32.0958300Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda I1204 13:17:20.850000 392547 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 392616
2025-12-04T13:38:32.0958460Z I1204 13:17:20.850000 392547 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 392617
2025-12-04T13:38:32.0958617Z I1204 13:17:20.851000 392547 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 392618
2025-12-04T13:38:32.0958778Z I1204 13:17:20.851000 392547 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 392619
2025-12-04T13:38:32.0959363Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0959401Z   _warn_cpu_init()
2025-12-04T13:38:32.0960015Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0960058Z   _warn_cpu_init()
2025-12-04T13:38:32.0960629Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0960672Z   _warn_cpu_init()
2025-12-04T13:38:32.0961255Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0961297Z   _warn_cpu_init()
2025-12-04T13:38:32.0961593Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.0961636Z   return func(*args, **kwargs)
2025-12-04T13:38:32.0961782Z [rank0]:E1204 13:18:16.796000 392616 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0961947Z [rank0]:E1204 13:18:16.796000 392616 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0962239Z [rank0]:E1204 13:18:16.796000 392616 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0962407Z [rank0]:E1204 13:18:16.796000 392616 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0962695Z [rank0]:E1204 13:18:16.796000 392616 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0962821Z [rank0]:E1204 13:18:16.796000 392616 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0963103Z [rank0]:E1204 13:18:16.796000 392616 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0963267Z [rank0]:E1204 13:18:16.796000 392616 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0963543Z [rank0]:E1204 13:18:16.796000 392616 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0963693Z [rank0]:E1204 13:18:16.796000 392616 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0963971Z [rank0]:E1204 13:18:16.796000 392616 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0964126Z [rank0]:E1204 13:18:16.796000 392616 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0964404Z [rank0]:E1204 13:18:16.796000 392616 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0964556Z [rank0]:E1204 13:18:16.796000 392616 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0965045Z [rank0]:E1204 13:18:16.796000 392616 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 2453667840 and is now 3965714432.
2025-12-04T13:38:32.0965173Z [rank0]:E1204 13:18:16.796000 392616 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0965372Z [rank0]:E1204 13:18:16.796000 392616 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0965745Z [rank0]:E1204 13:18:16.796000 392616 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.0965862Z [rank0]:E1204 13:18:16.796000 392616 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0966074Z [rank0]:E1204 13:18:16.796000 392616 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0966244Z [rank0]:E1204 13:18:16.796000 392616 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.0966288Z dist init r=0, world=4
2025-12-04T13:38:32.0966428Z [rank2]:E1204 13:18:16.803000 392618 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0966601Z [rank2]:E1204 13:18:16.803000 392618 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0966890Z [rank2]:E1204 13:18:16.803000 392618 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0967047Z [rank2]:E1204 13:18:16.803000 392618 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0967331Z [rank2]:E1204 13:18:16.803000 392618 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0967459Z [rank2]:E1204 13:18:16.803000 392618 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0967747Z [rank2]:E1204 13:18:16.803000 392618 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0967899Z [rank2]:E1204 13:18:16.803000 392618 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0968180Z [rank2]:E1204 13:18:16.803000 392618 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0968339Z [rank2]:E1204 13:18:16.803000 392618 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0968620Z [rank2]:E1204 13:18:16.803000 392618 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0968758Z [rank2]:E1204 13:18:16.803000 392618 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0969037Z [rank2]:E1204 13:18:16.803000 392618 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0969187Z [rank2]:E1204 13:18:16.803000 392618 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0969710Z [rank2]:E1204 13:18:16.803000 392618 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 2300575744 and is now 3812622336.
2025-12-04T13:38:32.0969845Z [rank2]:E1204 13:18:16.803000 392618 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0970044Z [rank2]:E1204 13:18:16.803000 392618 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0970414Z [rank2]:E1204 13:18:16.803000 392618 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.0970530Z [rank2]:E1204 13:18:16.803000 392618 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0970745Z [rank2]:E1204 13:18:16.803000 392618 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0970924Z [rank2]:E1204 13:18:16.803000 392618 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.0970967Z dist init r=2, world=4
2025-12-04T13:38:32.0971107Z [rank1]:E1204 13:18:16.818000 392617 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0971266Z [rank1]:E1204 13:18:16.818000 392617 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0971556Z [rank1]:E1204 13:18:16.818000 392617 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0971712Z [rank1]:E1204 13:18:16.818000 392617 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0972013Z [rank1]:E1204 13:18:16.818000 392617 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0972137Z [rank1]:E1204 13:18:16.818000 392617 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0972421Z [rank1]:E1204 13:18:16.818000 392617 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0972569Z [rank1]:E1204 13:18:16.818000 392617 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0972868Z [rank1]:E1204 13:18:16.818000 392617 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0973019Z [rank1]:E1204 13:18:16.818000 392617 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0973298Z [rank1]:E1204 13:18:16.818000 392617 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0973437Z [rank1]:E1204 13:18:16.818000 392617 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0973720Z [rank1]:E1204 13:18:16.818000 392617 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0973884Z [rank1]:E1204 13:18:16.818000 392617 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0974370Z [rank1]:E1204 13:18:16.818000 392617 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 2317352960 and is now 3829399552.
2025-12-04T13:38:32.0974488Z [rank1]:E1204 13:18:16.818000 392617 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0974686Z [rank1]:E1204 13:18:16.818000 392617 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0975058Z [rank1]:E1204 13:18:16.818000 392617 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.0975177Z [rank1]:E1204 13:18:16.818000 392617 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0975398Z [rank1]:E1204 13:18:16.818000 392617 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0975566Z [rank1]:E1204 13:18:16.818000 392617 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.0975604Z dist init r=1, world=4
2025-12-04T13:38:32.0975747Z [rank3]:E1204 13:18:16.826000 392619 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.0975911Z [rank3]:E1204 13:18:16.826000 392619 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.0976209Z [rank3]:E1204 13:18:16.826000 392619 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0976366Z [rank3]:E1204 13:18:16.826000 392619 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.0976650Z [rank3]:E1204 13:18:16.826000 392619 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0976776Z [rank3]:E1204 13:18:16.826000 392619 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.0977063Z [rank3]:E1204 13:18:16.826000 392619 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0977215Z [rank3]:E1204 13:18:16.826000 392619 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0977495Z [rank3]:E1204 13:18:16.826000 392619 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0977642Z [rank3]:E1204 13:18:16.826000 392619 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.0977921Z [rank3]:E1204 13:18:16.826000 392619 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0978069Z [rank3]:E1204 13:18:16.826000 392619 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.0978351Z [rank3]:E1204 13:18:16.826000 392619 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0978500Z [rank3]:E1204 13:18:16.826000 392619 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.0978990Z [rank3]:E1204 13:18:16.826000 392619 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 2250244096 and is now 3762290688.
2025-12-04T13:38:32.0979109Z [rank3]:E1204 13:18:16.826000 392619 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0979304Z [rank3]:E1204 13:18:16.826000 392619 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0979718Z [rank3]:E1204 13:18:16.826000 392619 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.0979832Z [rank3]:E1204 13:18:16.826000 392619 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.0980045Z [rank3]:E1204 13:18:16.826000 392619 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0980212Z [rank3]:E1204 13:18:16.826000 392619 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.0980255Z dist init r=3, world=4
2025-12-04T13:38:32.0980607Z [rank0]:[W1204 13:18:17.057057895 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.0980652Z FAILED [57.8792s] [100%]
2025-12-04T13:38:32.0980654Z 
2025-12-04T13:38:32.0980714Z =================================== FAILURES ===================================
2025-12-04T13:38:32.0980824Z _ TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda _
2025-12-04T13:38:32.0980873Z Traceback (most recent call last):
2025-12-04T13:38:32.0981038Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.0981099Z     self._join_processes(fn)
2025-12-04T13:38:32.0981275Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.0981334Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.0981516Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.0981564Z     raise RuntimeError(error)
2025-12-04T13:38:32.0981643Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T13:38:32.0981693Z Traceback (most recent call last):
2025-12-04T13:38:32.0981856Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0981902Z     getattr(self, test_name)()
2025-12-04T13:38:32.0982061Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0982115Z     fn()
2025-12-04T13:38:32.0982268Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0982314Z     method(*args, **kwargs)
2025-12-04T13:38:32.0982467Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0982512Z     method(*args, **kwargs)
2025-12-04T13:38:32.0982667Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0982707Z     with policy():
2025-12-04T13:38:32.0982864Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0982909Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0983276Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 2317352960 and is now 3829399552.
2025-12-04T13:38:32.0983280Z 
2025-12-04T13:38:32.0983356Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0983612Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.0983614Z 
2025-12-04T13:38:32.0983703Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0983706Z 
2025-12-04T13:38:32.0983707Z 
2025-12-04T13:38:32.0983786Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.0983875Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.0984113Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3cb93b52e3d78be7.xml -
2025-12-04T13:38:32.0984178Z =========================== short test summary info ============================
2025-12-04T13:38:32.0984447Z FAILED [57.8792s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T13:38:32.0984497Z Traceback (most recent call last):
2025-12-04T13:38:32.0984662Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.0984707Z     getattr(self, test_name)()
2025-12-04T13:38:32.0984868Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.0984910Z     fn()
2025-12-04T13:38:32.0985075Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0985121Z     method(*args, **kwargs)
2025-12-04T13:38:32.0985277Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.0985321Z     method(*args, **kwargs)
2025-12-04T13:38:32.0985475Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.0985516Z     with policy():
2025-12-04T13:38:32.0985670Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.0985715Z     raise RuntimeError(msg)
2025-12-04T13:38:32.0986077Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 2317352960 and is now 3829399552.
2025-12-04T13:38:32.0986094Z 
2025-12-04T13:38:32.0986169Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.0986412Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.0986414Z 
2025-12-04T13:38:32.0986502Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.0986569Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.0986632Z ====================== 1 failed, 32 deselected in 58.02s =======================
2025-12-04T13:38:32.0986674Z Got exit code 1
2025-12-04T13:38:32.0986870Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.0987001Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T13:38:32.0987192Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-8a4f83dd6075c6e2.xml
2025-12-04T13:38:32.0987254Z ============================= test session starts ==============================
2025-12-04T13:38:32.0987380Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.0987425Z cachedir: .pytest_cache
2025-12-04T13:38:32.0987585Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.0987636Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.0987677Z configfile: pytest.ini
2025-12-04T13:38:32.0987842Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.0987917Z collecting ... collected 60 items / 13 deselected / 47 selected
2025-12-04T13:38:32.0987974Z stepcurrent: skipping 13 already run items.
2025-12-04T13:38:32.0988018Z Running 20 items in this shard
2025-12-04T13:38:32.0988023Z 
2025-12-04T13:38:32.0988355Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_no_shard_cuda I1204 13:18:21.513000 392949 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 393018
2025-12-04T13:38:32.0988514Z I1204 13:18:21.514000 392949 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 393019
2025-12-04T13:38:32.0988667Z I1204 13:18:21.515000 392949 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 393020
2025-12-04T13:38:32.0988821Z I1204 13:18:21.515000 392949 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 393021
2025-12-04T13:38:32.0989411Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0989454Z   _warn_cpu_init()
2025-12-04T13:38:32.0989985Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.0990049Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.0990645Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0990683Z   _warn_cpu_init()
2025-12-04T13:38:32.0991175Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.0991240Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.0991829Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0991870Z   _warn_cpu_init()
2025-12-04T13:38:32.0992361Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.0992425Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.0993017Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.0993055Z   _warn_cpu_init()
2025-12-04T13:38:32.0993352Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0993436Z   shared = FSDP(shared, group, **fsdp_kwargs)  # type: ignore[assignment]
2025-12-04T13:38:32.0993743Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0993823Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.0994321Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.0994382Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.0994669Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0994766Z   shared = FSDP(shared, group, **fsdp_kwargs)  # type: ignore[assignment]
2025-12-04T13:38:32.0995256Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.0995318Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.0995605Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0995686Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.0995972Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0996057Z   shared = FSDP(shared, group, **fsdp_kwargs)  # type: ignore[assignment]
2025-12-04T13:38:32.0996559Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.0996618Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.0996909Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0996986Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.0997286Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0997367Z   shared = FSDP(shared, group, **fsdp_kwargs)  # type: ignore[assignment]
2025-12-04T13:38:32.0997856Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.0997928Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.0998219Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.0998298Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.0999568Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.0999750Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.0999982Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1000029Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1001297Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.1001424Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.1001651Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1001698Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1002970Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.1003107Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.1003336Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1003378Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1004639Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.1004775Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.1005001Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1005046Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1005274Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1005316Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1005540Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1005597Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1005820Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1005860Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1006081Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1006123Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1006417Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.1006458Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1006617Z [rank2]:E1204 13:18:30.115000 393020 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1006782Z [rank2]:E1204 13:18:30.115000 393020 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1007081Z [rank2]:E1204 13:18:30.115000 393020 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1007243Z [rank2]:E1204 13:18:30.115000 393020 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1007540Z [rank2]:E1204 13:18:30.115000 393020 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1007670Z [rank2]:E1204 13:18:30.115000 393020 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1007951Z [rank2]:E1204 13:18:30.115000 393020 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1008107Z [rank2]:E1204 13:18:30.115000 393020 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1008385Z [rank2]:E1204 13:18:30.115000 393020 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1008548Z [rank2]:E1204 13:18:30.115000 393020 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1008828Z [rank2]:E1204 13:18:30.115000 393020 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1008967Z [rank2]:E1204 13:18:30.115000 393020 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1009250Z [rank2]:E1204 13:18:30.115000 393020 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1009399Z [rank2]:E1204 13:18:30.115000 393020 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1009930Z [rank2]:E1204 13:18:30.115000 393020 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 2. CUDA driver allocated memory was 2300575744 and is now 17475567616.
2025-12-04T13:38:32.1010058Z [rank2]:E1204 13:18:30.115000 393020 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1010259Z [rank2]:E1204 13:18:30.115000 393020 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1010626Z [rank2]:E1204 13:18:30.115000 393020 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda
2025-12-04T13:38:32.1010742Z [rank2]:E1204 13:18:30.115000 393020 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1010971Z [rank2]:E1204 13:18:30.115000 393020 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1011140Z [rank2]:E1204 13:18:30.115000 393020 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.1011183Z dist init r=2, world=4
2025-12-04T13:38:32.1011320Z [rank1]:E1204 13:18:30.126000 393019 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1011483Z [rank1]:E1204 13:18:30.126000 393019 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1011786Z [rank1]:E1204 13:18:30.126000 393019 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1011953Z [rank1]:E1204 13:18:30.126000 393019 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1012247Z [rank1]:E1204 13:18:30.126000 393019 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1012372Z [rank1]:E1204 13:18:30.126000 393019 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1012653Z [rank1]:E1204 13:18:30.126000 393019 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1012815Z [rank1]:E1204 13:18:30.126000 393019 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1013095Z [rank1]:E1204 13:18:30.126000 393019 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1013242Z [rank1]:E1204 13:18:30.126000 393019 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1013523Z [rank1]:E1204 13:18:30.126000 393019 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1013663Z [rank1]:E1204 13:18:30.126000 393019 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1013942Z [rank1]:E1204 13:18:30.126000 393019 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1014095Z [rank1]:E1204 13:18:30.126000 393019 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1014586Z [rank1]:E1204 13:18:30.126000 393019 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 1. CUDA driver allocated memory was 2317352960 and is now 17492344832.
2025-12-04T13:38:32.1014703Z [rank1]:E1204 13:18:30.126000 393019 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1014899Z [rank1]:E1204 13:18:30.126000 393019 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1015275Z [rank1]:E1204 13:18:30.126000 393019 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda
2025-12-04T13:38:32.1015394Z [rank1]:E1204 13:18:30.126000 393019 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1015609Z [rank1]:E1204 13:18:30.126000 393019 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1015776Z [rank1]:E1204 13:18:30.126000 393019 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.1015816Z dist init r=1, world=4
2025-12-04T13:38:32.1015969Z [rank3]:E1204 13:18:30.128000 393021 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1016130Z [rank3]:E1204 13:18:30.128000 393021 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1016421Z [rank3]:E1204 13:18:30.128000 393021 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1016576Z [rank3]:E1204 13:18:30.128000 393021 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1016865Z [rank3]:E1204 13:18:30.128000 393021 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1016993Z [rank3]:E1204 13:18:30.128000 393021 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1017279Z [rank3]:E1204 13:18:30.128000 393021 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1017430Z [rank3]:E1204 13:18:30.128000 393021 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1017706Z [rank3]:E1204 13:18:30.128000 393021 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1017856Z [rank3]:E1204 13:18:30.128000 393021 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1018132Z [rank3]:E1204 13:18:30.128000 393021 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1018273Z [rank3]:E1204 13:18:30.128000 393021 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1018565Z [rank3]:E1204 13:18:30.128000 393021 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1018714Z [rank3]:E1204 13:18:30.128000 393021 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1019198Z [rank3]:E1204 13:18:30.128000 393021 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 3. CUDA driver allocated memory was 2250244096 and is now 17425235968.
2025-12-04T13:38:32.1019315Z [rank3]:E1204 13:18:30.128000 393021 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1019526Z [rank3]:E1204 13:18:30.128000 393021 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1019923Z [rank3]:E1204 13:18:30.128000 393021 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda
2025-12-04T13:38:32.1020036Z [rank3]:E1204 13:18:30.128000 393021 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1020253Z [rank3]:E1204 13:18:30.128000 393021 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1020432Z [rank3]:E1204 13:18:30.128000 393021 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.1020475Z dist init r=3, world=4
2025-12-04T13:38:32.1020615Z [rank0]:E1204 13:18:30.173000 393018 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1020778Z [rank0]:E1204 13:18:30.173000 393018 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1021066Z [rank0]:E1204 13:18:30.173000 393018 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1021222Z [rank0]:E1204 13:18:30.173000 393018 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1021513Z [rank0]:E1204 13:18:30.173000 393018 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1021659Z [rank0]:E1204 13:18:30.173000 393018 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1021938Z [rank0]:E1204 13:18:30.173000 393018 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1022085Z [rank0]:E1204 13:18:30.173000 393018 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1022363Z [rank0]:E1204 13:18:30.173000 393018 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1022511Z [rank0]:E1204 13:18:30.173000 393018 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1022790Z [rank0]:E1204 13:18:30.173000 393018 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1022938Z [rank0]:E1204 13:18:30.173000 393018 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1023219Z [rank0]:E1204 13:18:30.173000 393018 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1023370Z [rank0]:E1204 13:18:30.173000 393018 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1023863Z [rank0]:E1204 13:18:30.173000 393018 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 0. CUDA driver allocated memory was 2453667840 and is now 17628659712.
2025-12-04T13:38:32.1023981Z [rank0]:E1204 13:18:30.173000 393018 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1024177Z [rank0]:E1204 13:18:30.173000 393018 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1024540Z [rank0]:E1204 13:18:30.173000 393018 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda
2025-12-04T13:38:32.1024668Z [rank0]:E1204 13:18:30.173000 393018 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1024881Z [rank0]:E1204 13:18:30.173000 393018 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1025050Z [rank0]:E1204 13:18:30.173000 393018 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.1025090Z dist init r=0, world=4
2025-12-04T13:38:32.1025429Z [rank2]:[W1204 13:18:30.283400411 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1025760Z [rank1]:[W1204 13:18:30.297842878 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1026102Z [rank3]:[W1204 13:18:30.323723909 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1026433Z [rank0]:[W1204 13:18:30.454648269 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1026474Z FAILED [22.9370s] [  5%]
2025-12-04T13:38:32.1026476Z 
2025-12-04T13:38:32.1026536Z =================================== FAILURES ===================================
2025-12-04T13:38:32.1026638Z __ TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda ___
2025-12-04T13:38:32.1026690Z Traceback (most recent call last):
2025-12-04T13:38:32.1026856Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.1026903Z     self._join_processes(fn)
2025-12-04T13:38:32.1027088Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.1027146Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.1027326Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.1027374Z     raise RuntimeError(error)
2025-12-04T13:38:32.1027455Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T13:38:32.1027504Z Traceback (most recent call last):
2025-12-04T13:38:32.1027668Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1027715Z     getattr(self, test_name)()
2025-12-04T13:38:32.1027875Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1027913Z     fn()
2025-12-04T13:38:32.1028077Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1028122Z     method(*args, **kwargs)
2025-12-04T13:38:32.1028277Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1028318Z     method(*args, **kwargs)
2025-12-04T13:38:32.1028474Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1028512Z     with policy():
2025-12-04T13:38:32.1028681Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1028722Z     raise RuntimeError(msg)
2025-12-04T13:38:32.1029081Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 1. CUDA driver allocated memory was 2317352960 and is now 17492344832.
2025-12-04T13:38:32.1029083Z 
2025-12-04T13:38:32.1029162Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1029399Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda
2025-12-04T13:38:32.1029401Z 
2025-12-04T13:38:32.1029490Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1029493Z 
2025-12-04T13:38:32.1029498Z 
2025-12-04T13:38:32.1029623Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.1029714Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.1029954Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-8a4f83dd6075c6e2.xml -
2025-12-04T13:38:32.1030020Z =========================== short test summary info ============================
2025-12-04T13:38:32.1030271Z FAILED [22.9370s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_no_shard_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T13:38:32.1030321Z Traceback (most recent call last):
2025-12-04T13:38:32.1030489Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1030537Z     getattr(self, test_name)()
2025-12-04T13:38:32.1030700Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1030738Z     fn()
2025-12-04T13:38:32.1030892Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1030936Z     method(*args, **kwargs)
2025-12-04T13:38:32.1031104Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1031149Z     method(*args, **kwargs)
2025-12-04T13:38:32.1031300Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1031341Z     with policy():
2025-12-04T13:38:32.1031494Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1031540Z     raise RuntimeError(msg)
2025-12-04T13:38:32.1031901Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 1. CUDA driver allocated memory was 2317352960 and is now 17492344832.
2025-12-04T13:38:32.1031907Z 
2025-12-04T13:38:32.1031995Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1032234Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda
2025-12-04T13:38:32.1032237Z 
2025-12-04T13:38:32.1032324Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1032391Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.1032454Z ====================== 1 failed, 13 deselected in 23.10s =======================
2025-12-04T13:38:32.1032510Z Got exit code 1
2025-12-04T13:38:32.1032551Z Retrying single test...
2025-12-04T13:38:32.1032744Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-60c6062c7f204303.xml
2025-12-04T13:38:32.1032804Z ============================= test session starts ==============================
2025-12-04T13:38:32.1032924Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.1032965Z cachedir: .pytest_cache
2025-12-04T13:38:32.1033129Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.1033175Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.1033221Z configfile: pytest.ini
2025-12-04T13:38:32.1033385Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.1033464Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.1033709Z stepcurrent: skipping 13 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_no_shard_cuda
2025-12-04T13:38:32.1033752Z Running 1 items in this shard
2025-12-04T13:38:32.1033754Z 
2025-12-04T13:38:32.1034066Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_no_shard_cuda I1204 13:18:46.955000 394215 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 394284
2025-12-04T13:38:32.1034222Z I1204 13:18:46.956000 394215 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 394285
2025-12-04T13:38:32.1034377Z I1204 13:18:46.956000 394215 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 394286
2025-12-04T13:38:32.1034528Z I1204 13:18:46.957000 394215 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 394287
2025-12-04T13:38:32.1035124Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1035166Z   _warn_cpu_init()
2025-12-04T13:38:32.1035737Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1035781Z   _warn_cpu_init()
2025-12-04T13:38:32.1036285Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1036353Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1036840Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1036921Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1037493Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1037530Z   _warn_cpu_init()
2025-12-04T13:38:32.1038023Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1038093Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1038665Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1038706Z   _warn_cpu_init()
2025-12-04T13:38:32.1038997Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1039086Z   shared = FSDP(shared, group, **fsdp_kwargs)  # type: ignore[assignment]
2025-12-04T13:38:32.1039650Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1039714Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1040007Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1040088Z   shared = FSDP(shared, group, **fsdp_kwargs)  # type: ignore[assignment]
2025-12-04T13:38:32.1040376Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1040459Z   shared = FSDP(shared, group, **fsdp_kwargs)  # type: ignore[assignment]
2025-12-04T13:38:32.1040764Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1040842Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.1041333Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1041406Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1041699Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1041778Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.1042269Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1042330Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1042619Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1042710Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.1042998Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1043082Z   shared = FSDP(shared, group, **fsdp_kwargs)  # type: ignore[assignment]
2025-12-04T13:38:32.1043578Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1043638Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1043928Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1044002Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.1045299Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.1045430Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.1046690Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.1046830Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.1047062Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1047107Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1047345Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1047388Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1048641Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.1048765Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.1050087Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.1050209Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.1050444Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1050487Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1050712Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1050768Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1050990Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1051032Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1051253Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1051294Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1051514Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1051555Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1051775Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1051831Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1052125Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.1052167Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1052313Z [rank1]:E1204 13:18:55.448000 394285 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1052479Z [rank1]:E1204 13:18:55.448000 394285 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1052775Z [rank1]:E1204 13:18:55.448000 394285 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1052932Z [rank1]:E1204 13:18:55.448000 394285 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1053220Z [rank1]:E1204 13:18:55.448000 394285 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1053355Z [rank1]:E1204 13:18:55.448000 394285 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1053634Z [rank1]:E1204 13:18:55.448000 394285 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1053783Z [rank1]:E1204 13:18:55.448000 394285 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1054061Z [rank1]:E1204 13:18:55.448000 394285 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1054230Z [rank1]:E1204 13:18:55.448000 394285 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1054509Z [rank1]:E1204 13:18:55.448000 394285 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1054648Z [rank1]:E1204 13:18:55.448000 394285 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1054925Z [rank1]:E1204 13:18:55.448000 394285 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1055089Z [rank1]:E1204 13:18:55.448000 394285 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1055574Z [rank1]:E1204 13:18:55.448000 394285 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 1. CUDA driver allocated memory was 2317352960 and is now 17492344832.
2025-12-04T13:38:32.1055692Z [rank1]:E1204 13:18:55.448000 394285 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1055889Z [rank1]:E1204 13:18:55.448000 394285 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1056253Z [rank1]:E1204 13:18:55.448000 394285 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda
2025-12-04T13:38:32.1056380Z [rank1]:E1204 13:18:55.448000 394285 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1056594Z [rank1]:E1204 13:18:55.448000 394285 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1056760Z [rank1]:E1204 13:18:55.448000 394285 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.1056799Z dist init r=1, world=4
2025-12-04T13:38:32.1056938Z [rank0]:E1204 13:18:55.449000 394284 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1057098Z [rank0]:E1204 13:18:55.449000 394284 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1057391Z [rank0]:E1204 13:18:55.449000 394284 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1057548Z [rank0]:E1204 13:18:55.449000 394284 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1057844Z [rank0]:E1204 13:18:55.449000 394284 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1057971Z [rank0]:E1204 13:18:55.449000 394284 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1058247Z [rank0]:E1204 13:18:55.449000 394284 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1058400Z [rank0]:E1204 13:18:55.449000 394284 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1058687Z [rank0]:E1204 13:18:55.449000 394284 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1058837Z [rank0]:E1204 13:18:55.449000 394284 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1059114Z [rank0]:E1204 13:18:55.449000 394284 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1059250Z [rank0]:E1204 13:18:55.449000 394284 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1059541Z [rank0]:E1204 13:18:55.449000 394284 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1059729Z [rank0]:E1204 13:18:55.449000 394284 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1060213Z [rank0]:E1204 13:18:55.449000 394284 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 0. CUDA driver allocated memory was 2453667840 and is now 17628659712.
2025-12-04T13:38:32.1060329Z [rank0]:E1204 13:18:55.449000 394284 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1060542Z [rank0]:E1204 13:18:55.449000 394284 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1060911Z [rank0]:E1204 13:18:55.449000 394284 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda
2025-12-04T13:38:32.1061024Z [rank0]:E1204 13:18:55.449000 394284 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1061236Z [rank0]:E1204 13:18:55.449000 394284 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1061400Z [rank0]:E1204 13:18:55.449000 394284 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.1061443Z dist init r=0, world=4
2025-12-04T13:38:32.1061579Z [rank2]:E1204 13:18:55.476000 394286 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1061742Z [rank2]:E1204 13:18:55.476000 394286 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1062041Z [rank2]:E1204 13:18:55.476000 394286 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1062198Z [rank2]:E1204 13:18:55.476000 394286 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1062483Z [rank2]:E1204 13:18:55.476000 394286 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1062607Z [rank2]:E1204 13:18:55.476000 394286 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1062899Z [rank2]:E1204 13:18:55.476000 394286 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1063047Z [rank2]:E1204 13:18:55.476000 394286 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1063325Z [rank2]:E1204 13:18:55.476000 394286 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1063471Z [rank2]:E1204 13:18:55.476000 394286 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1063764Z [rank2]:E1204 13:18:55.476000 394286 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1063904Z [rank2]:E1204 13:18:55.476000 394286 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1064182Z [rank2]:E1204 13:18:55.476000 394286 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1064332Z [rank2]:E1204 13:18:55.476000 394286 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1064811Z [rank2]:E1204 13:18:55.476000 394286 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 2. CUDA driver allocated memory was 2300575744 and is now 17475567616.
2025-12-04T13:38:32.1064940Z [rank2]:E1204 13:18:55.476000 394286 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1065139Z [rank2]:E1204 13:18:55.476000 394286 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1065497Z [rank2]:E1204 13:18:55.476000 394286 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda
2025-12-04T13:38:32.1065612Z [rank2]:E1204 13:18:55.476000 394286 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1065823Z [rank2]:E1204 13:18:55.476000 394286 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1065990Z [rank2]:E1204 13:18:55.476000 394286 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.1066029Z dist init r=2, world=4
2025-12-04T13:38:32.1066179Z [rank3]:E1204 13:18:55.502000 394287 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1066337Z [rank3]:E1204 13:18:55.502000 394287 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1066628Z [rank3]:E1204 13:18:55.502000 394287 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1066787Z [rank3]:E1204 13:18:55.502000 394287 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1067087Z [rank3]:E1204 13:18:55.502000 394287 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1067213Z [rank3]:E1204 13:18:55.502000 394287 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1067489Z [rank3]:E1204 13:18:55.502000 394287 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1067638Z [rank3]:E1204 13:18:55.502000 394287 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1067913Z [rank3]:E1204 13:18:55.502000 394287 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1068076Z [rank3]:E1204 13:18:55.502000 394287 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1068356Z [rank3]:E1204 13:18:55.502000 394287 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1068494Z [rank3]:E1204 13:18:55.502000 394287 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1068775Z [rank3]:E1204 13:18:55.502000 394287 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1068923Z [rank3]:E1204 13:18:55.502000 394287 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1069418Z [rank3]:E1204 13:18:55.502000 394287 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 3. CUDA driver allocated memory was 2250244096 and is now 17425235968.
2025-12-04T13:38:32.1069531Z [rank3]:E1204 13:18:55.502000 394287 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1069762Z [rank3]:E1204 13:18:55.502000 394287 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1070124Z [rank3]:E1204 13:18:55.502000 394287 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda
2025-12-04T13:38:32.1070238Z [rank3]:E1204 13:18:55.502000 394287 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1070453Z [rank3]:E1204 13:18:55.502000 394287 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1070638Z [rank3]:E1204 13:18:55.502000 394287 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.1070678Z dist init r=3, world=4
2025-12-04T13:38:32.1071014Z [rank1]:[W1204 13:18:55.613843353 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1071346Z [rank0]:[W1204 13:18:55.635737655 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1071690Z [rank2]:[W1204 13:18:55.692616914 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1072017Z [rank3]:[W1204 13:18:55.748095530 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1072058Z FAILED [22.8387s] [100%]
2025-12-04T13:38:32.1072060Z 
2025-12-04T13:38:32.1072131Z =================================== FAILURES ===================================
2025-12-04T13:38:32.1072233Z __ TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda ___
2025-12-04T13:38:32.1072279Z Traceback (most recent call last):
2025-12-04T13:38:32.1072449Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.1072492Z     self._join_processes(fn)
2025-12-04T13:38:32.1072672Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.1072725Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.1072905Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.1072947Z     raise RuntimeError(error)
2025-12-04T13:38:32.1073031Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.1073076Z Traceback (most recent call last):
2025-12-04T13:38:32.1073252Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1073294Z     getattr(self, test_name)()
2025-12-04T13:38:32.1073455Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1073491Z     fn()
2025-12-04T13:38:32.1073643Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1073686Z     method(*args, **kwargs)
2025-12-04T13:38:32.1073839Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1073880Z     method(*args, **kwargs)
2025-12-04T13:38:32.1074032Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1074072Z     with policy():
2025-12-04T13:38:32.1074224Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1074268Z     raise RuntimeError(msg)
2025-12-04T13:38:32.1074634Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 0. CUDA driver allocated memory was 2453667840 and is now 17628659712.
2025-12-04T13:38:32.1074636Z 
2025-12-04T13:38:32.1074714Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1074947Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda
2025-12-04T13:38:32.1074949Z 
2025-12-04T13:38:32.1075039Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1075042Z 
2025-12-04T13:38:32.1075103Z Process 1 exited with error code 10 and exception:
2025-12-04T13:38:32.1075147Z Traceback (most recent call last):
2025-12-04T13:38:32.1075322Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1075363Z     getattr(self, test_name)()
2025-12-04T13:38:32.1075526Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1075560Z     fn()
2025-12-04T13:38:32.1075714Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1075753Z     method(*args, **kwargs)
2025-12-04T13:38:32.1075906Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1075959Z     method(*args, **kwargs)
2025-12-04T13:38:32.1076112Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1076149Z     with policy():
2025-12-04T13:38:32.1076305Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1076345Z     raise RuntimeError(msg)
2025-12-04T13:38:32.1076698Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 1. CUDA driver allocated memory was 2317352960 and is now 17492344832.
2025-12-04T13:38:32.1076700Z 
2025-12-04T13:38:32.1076774Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1077008Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda
2025-12-04T13:38:32.1077025Z 
2025-12-04T13:38:32.1077114Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1077116Z 
2025-12-04T13:38:32.1077118Z 
2025-12-04T13:38:32.1077193Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.1077284Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.1077516Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-60c6062c7f204303.xml -
2025-12-04T13:38:32.1077578Z =========================== short test summary info ============================
2025-12-04T13:38:32.1077828Z FAILED [22.8387s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_no_shard_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.1077876Z Traceback (most recent call last):
2025-12-04T13:38:32.1078040Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1078083Z     getattr(self, test_name)()
2025-12-04T13:38:32.1078245Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1078281Z     fn()
2025-12-04T13:38:32.1078443Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1078486Z     method(*args, **kwargs)
2025-12-04T13:38:32.1078638Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1078679Z     method(*args, **kwargs)
2025-12-04T13:38:32.1078832Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1078870Z     with policy():
2025-12-04T13:38:32.1079024Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1079064Z     raise RuntimeError(msg)
2025-12-04T13:38:32.1079431Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 0. CUDA driver allocated memory was 2453667840 and is now 17628659712.
2025-12-04T13:38:32.1079433Z 
2025-12-04T13:38:32.1079506Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1079772Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda
2025-12-04T13:38:32.1079774Z 
2025-12-04T13:38:32.1079876Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1079879Z 
2025-12-04T13:38:32.1079940Z Process 1 exited with error code 10 and exception:
2025-12-04T13:38:32.1079983Z Traceback (most recent call last):
2025-12-04T13:38:32.1080148Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1080189Z     getattr(self, test_name)()
2025-12-04T13:38:32.1080351Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1080387Z     fn()
2025-12-04T13:38:32.1080536Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1080577Z     method(*args, **kwargs)
2025-12-04T13:38:32.1080727Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1080769Z     method(*args, **kwargs)
2025-12-04T13:38:32.1080934Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1080972Z     with policy():
2025-12-04T13:38:32.1081126Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1081168Z     raise RuntimeError(msg)
2025-12-04T13:38:32.1081521Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 1. CUDA driver allocated memory was 2317352960 and is now 17492344832.
2025-12-04T13:38:32.1081523Z 
2025-12-04T13:38:32.1081597Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1081830Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda
2025-12-04T13:38:32.1081834Z 
2025-12-04T13:38:32.1081922Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1081986Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.1082052Z ====================== 1 failed, 32 deselected in 23.00s =======================
2025-12-04T13:38:32.1082103Z Got exit code 1
2025-12-04T13:38:32.1082144Z Retrying single test...
2025-12-04T13:38:32.1082335Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-81f9b84a4e1fc9ed.xml
2025-12-04T13:38:32.1082392Z ============================= test session starts ==============================
2025-12-04T13:38:32.1082507Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.1082549Z cachedir: .pytest_cache
2025-12-04T13:38:32.1082710Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.1082756Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.1082797Z configfile: pytest.ini
2025-12-04T13:38:32.1082974Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.1083051Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.1083279Z stepcurrent: skipping 13 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_no_shard_cuda
2025-12-04T13:38:32.1083324Z Running 1 items in this shard
2025-12-04T13:38:32.1083326Z 
2025-12-04T13:38:32.1083635Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_no_shard_cuda I1204 13:19:12.268000 395481 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 395550
2025-12-04T13:38:32.1083804Z I1204 13:19:12.268000 395481 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 395551
2025-12-04T13:38:32.1083957Z I1204 13:19:12.269000 395481 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 395552
2025-12-04T13:38:32.1084111Z I1204 13:19:12.269000 395481 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 395553
2025-12-04T13:38:32.1084695Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1084749Z   _warn_cpu_init()
2025-12-04T13:38:32.1085329Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1085366Z   _warn_cpu_init()
2025-12-04T13:38:32.1085864Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1085930Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1086431Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1086493Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1087065Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1087106Z   _warn_cpu_init()
2025-12-04T13:38:32.1087609Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1087668Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1088250Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1088299Z   _warn_cpu_init()
2025-12-04T13:38:32.1088596Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1088681Z   shared = FSDP(shared, group, **fsdp_kwargs)  # type: ignore[assignment]
2025-12-04T13:38:32.1089174Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1089236Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1089535Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1089649Z   shared = FSDP(shared, group, **fsdp_kwargs)  # type: ignore[assignment]
2025-12-04T13:38:32.1089938Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1090017Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.1090307Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1090385Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.1090881Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1090963Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1091252Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1091331Z   shared = FSDP(shared, group, **fsdp_kwargs)  # type: ignore[assignment]
2025-12-04T13:38:32.1091825Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1091903Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1092193Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1092269Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.1092554Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1092651Z   shared = FSDP(shared, group, **fsdp_kwargs)  # type: ignore[assignment]
2025-12-04T13:38:32.1093143Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1093203Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1093491Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1093566Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.1094843Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.1094990Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.1096259Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.1096385Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.1096629Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1096673Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1096900Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1096942Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1098205Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.1098339Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.1098565Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1098620Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1099893Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.1100017Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.1100245Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1100309Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1100533Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1100573Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1100796Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1100837Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1101058Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1101100Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1101333Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1101374Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1101663Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.1101705Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1101850Z [rank2]:E1204 13:19:20.721000 395552 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1102040Z [rank2]:E1204 13:19:20.721000 395552 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1102336Z [rank2]:E1204 13:19:20.721000 395552 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1102495Z [rank2]:E1204 13:19:20.721000 395552 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1102779Z [rank2]:E1204 13:19:20.721000 395552 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1102906Z [rank2]:E1204 13:19:20.721000 395552 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1103200Z [rank2]:E1204 13:19:20.721000 395552 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1103350Z [rank2]:E1204 13:19:20.721000 395552 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1103630Z [rank2]:E1204 13:19:20.721000 395552 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1103777Z [rank2]:E1204 13:19:20.721000 395552 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1104054Z [rank2]:E1204 13:19:20.721000 395552 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1104193Z [rank2]:E1204 13:19:20.721000 395552 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1104475Z [rank2]:E1204 13:19:20.721000 395552 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1104639Z [rank2]:E1204 13:19:20.721000 395552 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1105120Z [rank2]:E1204 13:19:20.721000 395552 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 2. CUDA driver allocated memory was 2300575744 and is now 17475567616.
2025-12-04T13:38:32.1105239Z [rank2]:E1204 13:19:20.721000 395552 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1105434Z [rank2]:E1204 13:19:20.721000 395552 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1105808Z [rank2]:E1204 13:19:20.721000 395552 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda
2025-12-04T13:38:32.1105923Z [rank2]:E1204 13:19:20.721000 395552 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1106135Z [rank2]:E1204 13:19:20.721000 395552 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1106313Z [rank2]:E1204 13:19:20.721000 395552 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.1106353Z dist init r=2, world=4
2025-12-04T13:38:32.1106491Z [rank0]:E1204 13:19:20.729000 395550 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1106651Z [rank0]:E1204 13:19:20.729000 395550 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1106942Z [rank0]:E1204 13:19:20.729000 395550 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1107095Z [rank0]:E1204 13:19:20.729000 395550 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1107380Z [rank0]:E1204 13:19:20.729000 395550 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1107515Z [rank0]:E1204 13:19:20.729000 395550 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1107796Z [rank0]:E1204 13:19:20.729000 395550 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1107946Z [rank0]:E1204 13:19:20.729000 395550 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1108221Z [rank0]:E1204 13:19:20.729000 395550 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1108370Z [rank0]:E1204 13:19:20.729000 395550 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1108648Z [rank0]:E1204 13:19:20.729000 395550 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1108796Z [rank0]:E1204 13:19:20.729000 395550 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1109074Z [rank0]:E1204 13:19:20.729000 395550 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1109224Z [rank0]:E1204 13:19:20.729000 395550 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1109742Z [rank0]:E1204 13:19:20.729000 395550 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 0. CUDA driver allocated memory was 2453667840 and is now 17628659712.
2025-12-04T13:38:32.1109871Z [rank0]:E1204 13:19:20.729000 395550 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1110068Z [rank0]:E1204 13:19:20.729000 395550 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1110426Z [rank0]:E1204 13:19:20.729000 395550 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda
2025-12-04T13:38:32.1110555Z [rank0]:E1204 13:19:20.729000 395550 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1110768Z [rank0]:E1204 13:19:20.729000 395550 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1110932Z [rank0]:E1204 13:19:20.729000 395550 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.1110974Z dist init r=0, world=4
2025-12-04T13:38:32.1111111Z [rank3]:E1204 13:19:20.735000 395553 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1111271Z [rank3]:E1204 13:19:20.735000 395553 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1111565Z [rank3]:E1204 13:19:20.735000 395553 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1111736Z [rank3]:E1204 13:19:20.735000 395553 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1116184Z [rank3]:E1204 13:19:20.735000 395553 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1116320Z [rank3]:E1204 13:19:20.735000 395553 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1116600Z [rank3]:E1204 13:19:20.735000 395553 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1116753Z [rank3]:E1204 13:19:20.735000 395553 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1117035Z [rank3]:E1204 13:19:20.735000 395553 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1117184Z [rank3]:E1204 13:19:20.735000 395553 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1117490Z [rank3]:E1204 13:19:20.735000 395553 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1117629Z [rank3]:E1204 13:19:20.735000 395553 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1117911Z [rank3]:E1204 13:19:20.735000 395553 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1118063Z [rank3]:E1204 13:19:20.735000 395553 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1118555Z [rank3]:E1204 13:19:20.735000 395553 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 3. CUDA driver allocated memory was 2250244096 and is now 17425235968.
2025-12-04T13:38:32.1118673Z [rank3]:E1204 13:19:20.735000 395553 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1118867Z [rank3]:E1204 13:19:20.735000 395553 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1119241Z [rank3]:E1204 13:19:20.735000 395553 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda
2025-12-04T13:38:32.1119356Z [rank3]:E1204 13:19:20.735000 395553 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1119600Z [rank3]:E1204 13:19:20.735000 395553 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1119764Z [rank3]:E1204 13:19:20.735000 395553 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.1119806Z dist init r=3, world=4
2025-12-04T13:38:32.1119945Z [rank1]:E1204 13:19:20.786000 395551 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1120107Z [rank1]:E1204 13:19:20.786000 395551 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1120415Z [rank1]:E1204 13:19:20.786000 395551 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1120570Z [rank1]:E1204 13:19:20.786000 395551 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1120861Z [rank1]:E1204 13:19:20.786000 395551 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1120985Z [rank1]:E1204 13:19:20.786000 395551 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1121265Z [rank1]:E1204 13:19:20.786000 395551 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1121414Z [rank1]:E1204 13:19:20.786000 395551 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1121706Z [rank1]:E1204 13:19:20.786000 395551 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1121856Z [rank1]:E1204 13:19:20.786000 395551 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1122131Z [rank1]:E1204 13:19:20.786000 395551 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1122272Z [rank1]:E1204 13:19:20.786000 395551 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1122573Z [rank1]:E1204 13:19:20.786000 395551 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1122725Z [rank1]:E1204 13:19:20.786000 395551 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1123203Z [rank1]:E1204 13:19:20.786000 395551 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 1. CUDA driver allocated memory was 2317352960 and is now 17492344832.
2025-12-04T13:38:32.1123338Z [rank1]:E1204 13:19:20.786000 395551 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1123537Z [rank1]:E1204 13:19:20.786000 395551 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1123895Z [rank1]:E1204 13:19:20.786000 395551 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda
2025-12-04T13:38:32.1124009Z [rank1]:E1204 13:19:20.786000 395551 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1124220Z [rank1]:E1204 13:19:20.786000 395551 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1124385Z [rank1]:E1204 13:19:20.786000 395551 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.1124435Z dist init r=1, world=4
2025-12-04T13:38:32.1124772Z [rank2]:[W1204 13:19:20.889798945 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1125105Z [rank0]:[W1204 13:19:20.912058172 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1125433Z [rank3]:[W1204 13:19:20.932379265 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1125762Z [rank1]:[W1204 13:19:21.060325642 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1125804Z FAILED [22.7375s] [100%]
2025-12-04T13:38:32.1125806Z 
2025-12-04T13:38:32.1125877Z =================================== FAILURES ===================================
2025-12-04T13:38:32.1125977Z __ TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda ___
2025-12-04T13:38:32.1126025Z Traceback (most recent call last):
2025-12-04T13:38:32.1126190Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.1126236Z     self._join_processes(fn)
2025-12-04T13:38:32.1126410Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.1126468Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.1126651Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.1126694Z     raise RuntimeError(error)
2025-12-04T13:38:32.1126787Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.1126834Z Traceback (most recent call last):
2025-12-04T13:38:32.1126999Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1127041Z     getattr(self, test_name)()
2025-12-04T13:38:32.1127205Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1127240Z     fn()
2025-12-04T13:38:32.1127396Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1127449Z     method(*args, **kwargs)
2025-12-04T13:38:32.1127605Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1127646Z     method(*args, **kwargs)
2025-12-04T13:38:32.1127800Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1127838Z     with policy():
2025-12-04T13:38:32.1127994Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1128035Z     raise RuntimeError(msg)
2025-12-04T13:38:32.1128392Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 0. CUDA driver allocated memory was 2453667840 and is now 17628659712.
2025-12-04T13:38:32.1128406Z 
2025-12-04T13:38:32.1128482Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1128718Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda
2025-12-04T13:38:32.1128721Z 
2025-12-04T13:38:32.1128812Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1128814Z 
2025-12-04T13:38:32.1128816Z 
2025-12-04T13:38:32.1128893Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.1128985Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.1129221Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-81f9b84a4e1fc9ed.xml -
2025-12-04T13:38:32.1129286Z =========================== short test summary info ============================
2025-12-04T13:38:32.1129537Z FAILED [22.7375s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_no_shard_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.1129623Z Traceback (most recent call last):
2025-12-04T13:38:32.1129807Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1129852Z     getattr(self, test_name)()
2025-12-04T13:38:32.1130015Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1130055Z     fn()
2025-12-04T13:38:32.1130210Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1130250Z     method(*args, **kwargs)
2025-12-04T13:38:32.1130406Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1130446Z     method(*args, **kwargs)
2025-12-04T13:38:32.1130600Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1130649Z     with policy():
2025-12-04T13:38:32.1130807Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1130847Z     raise RuntimeError(msg)
2025-12-04T13:38:32.1131204Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 0. CUDA driver allocated memory was 2453667840 and is now 17628659712.
2025-12-04T13:38:32.1131206Z 
2025-12-04T13:38:32.1131281Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1131532Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_no_shard_cuda
2025-12-04T13:38:32.1131534Z 
2025-12-04T13:38:32.1131620Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1131689Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.1131753Z ====================== 1 failed, 32 deselected in 22.90s =======================
2025-12-04T13:38:32.1131793Z Got exit code 1
2025-12-04T13:38:32.1131977Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_no_shard_cuda
2025-12-04T13:38:32.1132105Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T13:38:32.1132296Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-82617ca858ac6daf.xml
2025-12-04T13:38:32.1132368Z ============================= test session starts ==============================
2025-12-04T13:38:32.1132485Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.1132527Z cachedir: .pytest_cache
2025-12-04T13:38:32.1132688Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.1132735Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.1132778Z configfile: pytest.ini
2025-12-04T13:38:32.1132944Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.1133020Z collecting ... collected 60 items / 14 deselected / 46 selected
2025-12-04T13:38:32.1133073Z stepcurrent: skipping 14 already run items.
2025-12-04T13:38:32.1133120Z Running 19 items in this shard
2025-12-04T13:38:32.1133121Z 
2025-12-04T13:38:32.1133433Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_no_shard_cuda I1204 13:19:37.497000 396747 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 396816
2025-12-04T13:38:32.1133593Z I1204 13:19:37.498000 396747 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 396817
2025-12-04T13:38:32.1133758Z I1204 13:19:37.499000 396747 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 396818
2025-12-04T13:38:32.1133912Z I1204 13:19:37.499000 396747 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 396819
2025-12-04T13:38:32.1134499Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1134539Z   _warn_cpu_init()
2025-12-04T13:38:32.1135044Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1135107Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1135689Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1135742Z   _warn_cpu_init()
2025-12-04T13:38:32.1136234Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1136298Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1136870Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1136922Z   _warn_cpu_init()
2025-12-04T13:38:32.1137413Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1137471Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1138044Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1138084Z   _warn_cpu_init()
2025-12-04T13:38:32.1138395Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1138484Z   shared = FSDP(shared, group, **fsdp_kwargs)  # type: ignore[assignment]
2025-12-04T13:38:32.1138975Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1139036Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1139339Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1139426Z   shared = FSDP(shared, group, **fsdp_kwargs)  # type: ignore[assignment]
2025-12-04T13:38:32.1139941Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1140020Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1140307Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1140387Z   shared = FSDP(shared, group, **fsdp_kwargs)  # type: ignore[assignment]
2025-12-04T13:38:32.1140678Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1140756Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.1141043Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1141134Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.1141422Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1141497Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.1141991Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1142052Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1142344Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.1142392Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1142690Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1142772Z   shared = FSDP(shared, group, **fsdp_kwargs)  # type: ignore[assignment]
2025-12-04T13:38:32.1143262Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1143324Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1143626Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1143701Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.1143932Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1143974Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1144200Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1144242Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1144480Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1144521Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1144745Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1144786Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1145008Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1145049Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1145272Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1145315Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1145546Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1145589Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1145811Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1145856Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1146003Z [rank2]:E1204 13:19:46.095000 396818 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1146169Z [rank2]:E1204 13:19:46.095000 396818 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1146463Z [rank2]:E1204 13:19:46.095000 396818 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1146627Z [rank2]:E1204 13:19:46.095000 396818 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1146928Z [rank2]:E1204 13:19:46.095000 396818 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1147056Z [rank2]:E1204 13:19:46.095000 396818 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1147337Z [rank2]:E1204 13:19:46.095000 396818 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1147489Z [rank2]:E1204 13:19:46.095000 396818 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1147771Z [rank2]:E1204 13:19:46.095000 396818 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1147929Z [rank2]:E1204 13:19:46.095000 396818 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1148208Z [rank2]:E1204 13:19:46.095000 396818 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1148344Z [rank2]:E1204 13:19:46.095000 396818 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1148626Z [rank2]:E1204 13:19:46.095000 396818 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1148790Z [rank2]:E1204 13:19:46.095000 396818 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1149275Z [rank2]:E1204 13:19:46.095000 396818 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 2. CUDA driver allocated memory was 2300575744 and is now 17467179008.
2025-12-04T13:38:32.1149394Z [rank2]:E1204 13:19:46.095000 396818 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1149630Z [rank2]:E1204 13:19:46.095000 396818 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1150009Z [rank2]:E1204 13:19:46.095000 396818 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda
2025-12-04T13:38:32.1150124Z [rank2]:E1204 13:19:46.095000 396818 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1150342Z [rank2]:E1204 13:19:46.095000 396818 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1150508Z [rank2]:E1204 13:19:46.095000 396818 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.1150546Z dist init r=2, world=4
2025-12-04T13:38:32.1150686Z [rank0]:E1204 13:19:46.105000 396816 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1150846Z [rank0]:E1204 13:19:46.105000 396816 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1151137Z [rank0]:E1204 13:19:46.105000 396816 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1151306Z [rank0]:E1204 13:19:46.105000 396816 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1151595Z [rank0]:E1204 13:19:46.105000 396816 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1151719Z [rank0]:E1204 13:19:46.105000 396816 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1151999Z [rank0]:E1204 13:19:46.105000 396816 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1152162Z [rank0]:E1204 13:19:46.105000 396816 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1152439Z [rank0]:E1204 13:19:46.105000 396816 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1152587Z [rank0]:E1204 13:19:46.105000 396816 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1152862Z [rank0]:E1204 13:19:46.105000 396816 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1153015Z [rank0]:E1204 13:19:46.105000 396816 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1153292Z [rank0]:E1204 13:19:46.105000 396816 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1153444Z [rank0]:E1204 13:19:46.105000 396816 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1153931Z [rank0]:E1204 13:19:46.105000 396816 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 0. CUDA driver allocated memory was 2453667840 and is now 17620271104.
2025-12-04T13:38:32.1154066Z [rank0]:E1204 13:19:46.105000 396816 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1154264Z [rank0]:E1204 13:19:46.105000 396816 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1154622Z [rank0]:E1204 13:19:46.105000 396816 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda
2025-12-04T13:38:32.1154738Z [rank0]:E1204 13:19:46.105000 396816 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1154949Z [rank0]:E1204 13:19:46.105000 396816 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1155118Z [rank0]:E1204 13:19:46.105000 396816 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.1155160Z dist init r=0, world=4
2025-12-04T13:38:32.1155296Z [rank1]:E1204 13:19:46.174000 396817 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1155469Z [rank1]:E1204 13:19:46.174000 396817 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1155757Z [rank1]:E1204 13:19:46.174000 396817 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1155913Z [rank1]:E1204 13:19:46.174000 396817 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1156199Z [rank1]:E1204 13:19:46.174000 396817 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1156327Z [rank1]:E1204 13:19:46.174000 396817 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1156615Z [rank1]:E1204 13:19:46.174000 396817 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1156766Z [rank1]:E1204 13:19:46.174000 396817 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1157045Z [rank1]:E1204 13:19:46.174000 396817 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1157203Z [rank1]:E1204 13:19:46.174000 396817 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1157483Z [rank1]:E1204 13:19:46.174000 396817 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1157620Z [rank1]:E1204 13:19:46.174000 396817 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1157900Z [rank1]:E1204 13:19:46.174000 396817 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1158047Z [rank1]:E1204 13:19:46.174000 396817 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1158527Z [rank1]:E1204 13:19:46.174000 396817 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 1. CUDA driver allocated memory was 2317352960 and is now 17483956224.
2025-12-04T13:38:32.1158655Z [rank1]:E1204 13:19:46.174000 396817 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1158851Z [rank1]:E1204 13:19:46.174000 396817 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1159208Z [rank1]:E1204 13:19:46.174000 396817 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda
2025-12-04T13:38:32.1159322Z [rank1]:E1204 13:19:46.174000 396817 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1159538Z [rank1]:E1204 13:19:46.174000 396817 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1159742Z [rank1]:E1204 13:19:46.174000 396817 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.1159794Z dist init r=1, world=4
2025-12-04T13:38:32.1159934Z [rank3]:E1204 13:19:46.176000 396819 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1160092Z [rank3]:E1204 13:19:46.176000 396819 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1160381Z [rank3]:E1204 13:19:46.176000 396819 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1160535Z [rank3]:E1204 13:19:46.176000 396819 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1160840Z [rank3]:E1204 13:19:46.176000 396819 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1160963Z [rank3]:E1204 13:19:46.176000 396819 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1161244Z [rank3]:E1204 13:19:46.176000 396819 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1161394Z [rank3]:E1204 13:19:46.176000 396819 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1161684Z [rank3]:E1204 13:19:46.176000 396819 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1161833Z [rank3]:E1204 13:19:46.176000 396819 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1162113Z [rank3]:E1204 13:19:46.176000 396819 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1162251Z [rank3]:E1204 13:19:46.176000 396819 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1162528Z [rank3]:E1204 13:19:46.176000 396819 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1162693Z [rank3]:E1204 13:19:46.176000 396819 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1163172Z [rank3]:E1204 13:19:46.176000 396819 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 3. CUDA driver allocated memory was 2250244096 and is now 17416847360.
2025-12-04T13:38:32.1163286Z [rank3]:E1204 13:19:46.176000 396819 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1163482Z [rank3]:E1204 13:19:46.176000 396819 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1163836Z [rank3]:E1204 13:19:46.176000 396819 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda
2025-12-04T13:38:32.1163954Z [rank3]:E1204 13:19:46.176000 396819 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1164174Z [rank3]:E1204 13:19:46.176000 396819 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1164340Z [rank3]:E1204 13:19:46.176000 396819 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.1164378Z dist init r=3, world=4
2025-12-04T13:38:32.1164715Z [rank0]:[W1204 13:19:46.302651884 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1165060Z [rank2]:[W1204 13:19:46.303122638 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1165389Z [rank1]:[W1204 13:19:46.418872470 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1165716Z [rank3]:[W1204 13:19:46.544867811 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1165767Z FAILED [22.8381s] [  5%]
2025-12-04T13:38:32.1165770Z 
2025-12-04T13:38:32.1165830Z =================================== FAILURES ===================================
2025-12-04T13:38:32.1165930Z ___ TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda ___
2025-12-04T13:38:32.1165980Z Traceback (most recent call last):
2025-12-04T13:38:32.1166150Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.1166194Z     self._join_processes(fn)
2025-12-04T13:38:32.1166371Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.1166425Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.1166605Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.1166649Z     raise RuntimeError(error)
2025-12-04T13:38:32.1166743Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.1166788Z Traceback (most recent call last):
2025-12-04T13:38:32.1166953Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1166996Z     getattr(self, test_name)()
2025-12-04T13:38:32.1167158Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1167192Z     fn()
2025-12-04T13:38:32.1167347Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1167387Z     method(*args, **kwargs)
2025-12-04T13:38:32.1167542Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1167583Z     method(*args, **kwargs)
2025-12-04T13:38:32.1167739Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1167776Z     with policy():
2025-12-04T13:38:32.1167933Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1167975Z     raise RuntimeError(msg)
2025-12-04T13:38:32.1168340Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 0. CUDA driver allocated memory was 2453667840 and is now 17620271104.
2025-12-04T13:38:32.1168343Z 
2025-12-04T13:38:32.1168422Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1168655Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda
2025-12-04T13:38:32.1168659Z 
2025-12-04T13:38:32.1168748Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1168750Z 
2025-12-04T13:38:32.1168752Z 
2025-12-04T13:38:32.1168825Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.1168925Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.1169161Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-82617ca858ac6daf.xml -
2025-12-04T13:38:32.1169224Z =========================== short test summary info ============================
2025-12-04T13:38:32.1169470Z FAILED [22.8381s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_no_shard_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.1169535Z Traceback (most recent call last):
2025-12-04T13:38:32.1169741Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1169785Z     getattr(self, test_name)()
2025-12-04T13:38:32.1169951Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1169986Z     fn()
2025-12-04T13:38:32.1170142Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1170182Z     method(*args, **kwargs)
2025-12-04T13:38:32.1170336Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1170375Z     method(*args, **kwargs)
2025-12-04T13:38:32.1170527Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1170565Z     with policy():
2025-12-04T13:38:32.1170734Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1170774Z     raise RuntimeError(msg)
2025-12-04T13:38:32.1171130Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 0. CUDA driver allocated memory was 2453667840 and is now 17620271104.
2025-12-04T13:38:32.1171133Z 
2025-12-04T13:38:32.1171207Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1171441Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda
2025-12-04T13:38:32.1171444Z 
2025-12-04T13:38:32.1171533Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1171599Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.1171664Z ====================== 1 failed, 14 deselected in 23.00s =======================
2025-12-04T13:38:32.1171703Z Got exit code 1
2025-12-04T13:38:32.1171746Z Retrying single test...
2025-12-04T13:38:32.1171949Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-76a7782323b97443.xml
2025-12-04T13:38:32.1172011Z ============================= test session starts ==============================
2025-12-04T13:38:32.1172126Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.1172170Z cachedir: .pytest_cache
2025-12-04T13:38:32.1172329Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.1172380Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.1172422Z configfile: pytest.ini
2025-12-04T13:38:32.1172589Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.1172664Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.1172902Z stepcurrent: skipping 14 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_no_shard_cuda
2025-12-04T13:38:32.1172946Z Running 1 items in this shard
2025-12-04T13:38:32.1172947Z 
2025-12-04T13:38:32.1173253Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_no_shard_cuda I1204 13:20:02.914000 398157 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 398226
2025-12-04T13:38:32.1173412Z I1204 13:20:02.915000 398157 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 398227
2025-12-04T13:38:32.1173580Z I1204 13:20:02.915000 398157 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 398228
2025-12-04T13:38:32.1173733Z I1204 13:20:02.916000 398157 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 398229
2025-12-04T13:38:32.1174314Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1174354Z   _warn_cpu_init()
2025-12-04T13:38:32.1174850Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1174927Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1175505Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1175541Z   _warn_cpu_init()
2025-12-04T13:38:32.1176120Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1176158Z   _warn_cpu_init()
2025-12-04T13:38:32.1176662Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1176725Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1177211Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1177284Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1177852Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1177891Z   _warn_cpu_init()
2025-12-04T13:38:32.1178394Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1178452Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1178745Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1178828Z   shared = FSDP(shared, group, **fsdp_kwargs)  # type: ignore[assignment]
2025-12-04T13:38:32.1179317Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1179391Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1179719Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1179799Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.1180085Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1180167Z   shared = FSDP(shared, group, **fsdp_kwargs)  # type: ignore[assignment]
2025-12-04T13:38:32.1180456Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1180538Z   shared = FSDP(shared, group, **fsdp_kwargs)  # type: ignore[assignment]
2025-12-04T13:38:32.1181043Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1181104Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1181394Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1181471Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.1181771Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1181846Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.1182134Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1182212Z   shared = FSDP(shared, group, **fsdp_kwargs)  # type: ignore[assignment]
2025-12-04T13:38:32.1182702Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1182778Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1183069Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.1183114Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1183402Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1183478Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.1183723Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1183767Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1183992Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1184034Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1184256Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1184298Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1184519Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1184563Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1184784Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1184824Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1185057Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1185097Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1185319Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1185358Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1185578Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1185619Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1185766Z [rank3]:E1204 13:20:11.623000 398229 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1185946Z [rank3]:E1204 13:20:11.623000 398229 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1186241Z [rank3]:E1204 13:20:11.623000 398229 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1186397Z [rank3]:E1204 13:20:11.623000 398229 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1186687Z [rank3]:E1204 13:20:11.623000 398229 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1186829Z [rank3]:E1204 13:20:11.623000 398229 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1187111Z [rank3]:E1204 13:20:11.623000 398229 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1187262Z [rank3]:E1204 13:20:11.623000 398229 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1187540Z [rank3]:E1204 13:20:11.623000 398229 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1187690Z [rank3]:E1204 13:20:11.623000 398229 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1187976Z [rank3]:E1204 13:20:11.623000 398229 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1188115Z [rank3]:E1204 13:20:11.623000 398229 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1188394Z [rank3]:E1204 13:20:11.623000 398229 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1188542Z [rank3]:E1204 13:20:11.623000 398229 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1189024Z [rank3]:E1204 13:20:11.623000 398229 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 3. CUDA driver allocated memory was 2250244096 and is now 17416847360.
2025-12-04T13:38:32.1189142Z [rank3]:E1204 13:20:11.623000 398229 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1189353Z [rank3]:E1204 13:20:11.623000 398229 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1189752Z [rank3]:E1204 13:20:11.623000 398229 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda
2025-12-04T13:38:32.1189866Z [rank3]:E1204 13:20:11.623000 398229 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1190081Z [rank3]:E1204 13:20:11.623000 398229 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1190266Z [rank3]:E1204 13:20:11.623000 398229 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.1190308Z dist init r=3, world=4
2025-12-04T13:38:32.1190446Z [rank1]:E1204 13:20:11.666000 398227 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1190608Z [rank1]:E1204 13:20:11.666000 398227 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1190894Z [rank1]:E1204 13:20:11.666000 398227 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1191068Z [rank1]:E1204 13:20:11.666000 398227 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1191355Z [rank1]:E1204 13:20:11.666000 398227 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1191483Z [rank1]:E1204 13:20:11.666000 398227 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1191765Z [rank1]:E1204 13:20:11.666000 398227 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1191915Z [rank1]:E1204 13:20:11.666000 398227 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1192197Z [rank1]:E1204 13:20:11.666000 398227 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1192365Z [rank1]:E1204 13:20:11.666000 398227 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1192643Z [rank1]:E1204 13:20:11.666000 398227 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1192778Z [rank1]:E1204 13:20:11.666000 398227 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1193057Z [rank1]:E1204 13:20:11.666000 398227 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1193208Z [rank1]:E1204 13:20:11.666000 398227 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1193699Z [rank1]:E1204 13:20:11.666000 398227 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 1. CUDA driver allocated memory was 2317352960 and is now 17483956224.
2025-12-04T13:38:32.1193817Z [rank1]:E1204 13:20:11.666000 398227 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1194014Z [rank1]:E1204 13:20:11.666000 398227 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1194371Z [rank1]:E1204 13:20:11.666000 398227 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda
2025-12-04T13:38:32.1194486Z [rank1]:E1204 13:20:11.666000 398227 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1194711Z [rank1]:E1204 13:20:11.666000 398227 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1194876Z [rank1]:E1204 13:20:11.666000 398227 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.1194914Z dist init r=1, world=4
2025-12-04T13:38:32.1195052Z [rank0]:E1204 13:20:11.680000 398226 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1195211Z [rank0]:E1204 13:20:11.680000 398226 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1195512Z [rank0]:E1204 13:20:11.680000 398226 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1195666Z [rank0]:E1204 13:20:11.680000 398226 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1195954Z [rank0]:E1204 13:20:11.680000 398226 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1196080Z [rank0]:E1204 13:20:11.680000 398226 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1196358Z [rank0]:E1204 13:20:11.680000 398226 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1196518Z [rank0]:E1204 13:20:11.680000 398226 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1196795Z [rank0]:E1204 13:20:11.680000 398226 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1196944Z [rank0]:E1204 13:20:11.680000 398226 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1197219Z [rank0]:E1204 13:20:11.680000 398226 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1197357Z [rank0]:E1204 13:20:11.680000 398226 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1197638Z [rank0]:E1204 13:20:11.680000 398226 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1197786Z [rank0]:E1204 13:20:11.680000 398226 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1198279Z [rank0]:E1204 13:20:11.680000 398226 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 0. CUDA driver allocated memory was 2453667840 and is now 17620271104.
2025-12-04T13:38:32.1198395Z [rank0]:E1204 13:20:11.680000 398226 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1198594Z [rank0]:E1204 13:20:11.680000 398226 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1198963Z [rank0]:E1204 13:20:11.680000 398226 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda
2025-12-04T13:38:32.1199079Z [rank0]:E1204 13:20:11.680000 398226 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1199291Z [rank0]:E1204 13:20:11.680000 398226 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1199456Z [rank0]:E1204 13:20:11.680000 398226 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.1199507Z dist init r=0, world=4
2025-12-04T13:38:32.1199673Z [rank2]:E1204 13:20:11.697000 398228 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1199836Z [rank2]:E1204 13:20:11.697000 398228 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1200127Z [rank2]:E1204 13:20:11.697000 398228 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1200283Z [rank2]:E1204 13:20:11.697000 398228 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1200570Z [rank2]:E1204 13:20:11.697000 398228 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1200711Z [rank2]:E1204 13:20:11.697000 398228 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1200994Z [rank2]:E1204 13:20:11.697000 398228 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1201143Z [rank2]:E1204 13:20:11.697000 398228 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1201423Z [rank2]:E1204 13:20:11.697000 398228 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1201570Z [rank2]:E1204 13:20:11.697000 398228 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1201850Z [rank2]:E1204 13:20:11.697000 398228 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1201988Z [rank2]:E1204 13:20:11.697000 398228 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1202291Z [rank2]:E1204 13:20:11.697000 398228 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1202441Z [rank2]:E1204 13:20:11.697000 398228 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1202915Z [rank2]:E1204 13:20:11.697000 398228 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 2. CUDA driver allocated memory was 2300575744 and is now 17467179008.
2025-12-04T13:38:32.1203032Z [rank2]:E1204 13:20:11.697000 398228 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1203242Z [rank2]:E1204 13:20:11.697000 398228 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1203600Z [rank2]:E1204 13:20:11.697000 398228 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda
2025-12-04T13:38:32.1203712Z [rank2]:E1204 13:20:11.697000 398228 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1203940Z [rank2]:E1204 13:20:11.697000 398228 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1204105Z [rank2]:E1204 13:20:11.697000 398228 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.1204144Z dist init r=2, world=4
2025-12-04T13:38:32.1204479Z [rank3]:[W1204 13:20:11.817392678 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1204806Z [rank1]:[W1204 13:20:11.930174227 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1205135Z [rank0]:[W1204 13:20:12.037192689 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1205474Z [rank2]:[W1204 13:20:12.087850407 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1205516Z FAILED [23.0372s] [100%]
2025-12-04T13:38:32.1205518Z 
2025-12-04T13:38:32.1205576Z =================================== FAILURES ===================================
2025-12-04T13:38:32.1205676Z ___ TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda ___
2025-12-04T13:38:32.1205724Z Traceback (most recent call last):
2025-12-04T13:38:32.1205889Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.1205936Z     self._join_processes(fn)
2025-12-04T13:38:32.1206111Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.1206166Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.1206356Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.1206401Z     raise RuntimeError(error)
2025-12-04T13:38:32.1206481Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.1206527Z Traceback (most recent call last):
2025-12-04T13:38:32.1206688Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1206732Z     getattr(self, test_name)()
2025-12-04T13:38:32.1206891Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1206929Z     fn()
2025-12-04T13:38:32.1207080Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1207122Z     method(*args, **kwargs)
2025-12-04T13:38:32.1207286Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1207328Z     method(*args, **kwargs)
2025-12-04T13:38:32.1207479Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1207518Z     with policy():
2025-12-04T13:38:32.1207673Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1207717Z     raise RuntimeError(msg)
2025-12-04T13:38:32.1208085Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 3. CUDA driver allocated memory was 2250244096 and is now 17416847360.
2025-12-04T13:38:32.1208089Z 
2025-12-04T13:38:32.1208165Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1208400Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda
2025-12-04T13:38:32.1208402Z 
2025-12-04T13:38:32.1208489Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1208491Z 
2025-12-04T13:38:32.1208493Z 
2025-12-04T13:38:32.1208569Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.1208657Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.1208903Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-76a7782323b97443.xml -
2025-12-04T13:38:32.1208965Z =========================== short test summary info ============================
2025-12-04T13:38:32.1209215Z FAILED [23.0372s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_no_shard_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.1209262Z Traceback (most recent call last):
2025-12-04T13:38:32.1209428Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1209472Z     getattr(self, test_name)()
2025-12-04T13:38:32.1209657Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1209695Z     fn()
2025-12-04T13:38:32.1209850Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1209892Z     method(*args, **kwargs)
2025-12-04T13:38:32.1210045Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1210087Z     method(*args, **kwargs)
2025-12-04T13:38:32.1210252Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1210291Z     with policy():
2025-12-04T13:38:32.1210443Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1210486Z     raise RuntimeError(msg)
2025-12-04T13:38:32.1210837Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 3. CUDA driver allocated memory was 2250244096 and is now 17416847360.
2025-12-04T13:38:32.1210844Z 
2025-12-04T13:38:32.1210919Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1211164Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda
2025-12-04T13:38:32.1211167Z 
2025-12-04T13:38:32.1211254Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1211319Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.1211382Z ====================== 1 failed, 32 deselected in 23.20s =======================
2025-12-04T13:38:32.1211420Z Got exit code 1
2025-12-04T13:38:32.1211460Z Retrying single test...
2025-12-04T13:38:32.1211651Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-729675f2485b5e8a.xml
2025-12-04T13:38:32.1211724Z ============================= test session starts ==============================
2025-12-04T13:38:32.1211839Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.1211879Z cachedir: .pytest_cache
2025-12-04T13:38:32.1212040Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.1212086Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.1212128Z configfile: pytest.ini
2025-12-04T13:38:32.1212293Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.1212369Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.1212593Z stepcurrent: skipping 14 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_no_shard_cuda
2025-12-04T13:38:32.1212652Z Running 1 items in this shard
2025-12-04T13:38:32.1212655Z 
2025-12-04T13:38:32.1212961Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_no_shard_cuda I1204 13:20:28.531000 399567 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 399636
2025-12-04T13:38:32.1213117Z I1204 13:20:28.531000 399567 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 399637
2025-12-04T13:38:32.1213271Z I1204 13:20:28.532000 399567 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 399638
2025-12-04T13:38:32.1213423Z I1204 13:20:28.532000 399567 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 399639
2025-12-04T13:38:32.1214174Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1214214Z   _warn_cpu_init()
2025-12-04T13:38:32.1214721Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1214788Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1215374Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1215414Z   _warn_cpu_init()
2025-12-04T13:38:32.1215903Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1215966Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1216552Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1216590Z   _warn_cpu_init()
2025-12-04T13:38:32.1217084Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1217142Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1217716Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1217773Z   _warn_cpu_init()
2025-12-04T13:38:32.1218062Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1218147Z   shared = FSDP(shared, group, **fsdp_kwargs)  # type: ignore[assignment]
2025-12-04T13:38:32.1218432Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1218515Z   shared = FSDP(shared, group, **fsdp_kwargs)  # type: ignore[assignment]
2025-12-04T13:38:32.1219017Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1219077Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1219364Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1219443Z   shared = FSDP(shared, group, **fsdp_kwargs)  # type: ignore[assignment]
2025-12-04T13:38:32.1219985Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1220043Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1220333Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1220410Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.1220699Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1220790Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.1221278Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1221338Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1221625Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1221701Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.1222005Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.1222051Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1222341Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1222423Z   shared = FSDP(shared, group, **fsdp_kwargs)  # type: ignore[assignment]
2025-12-04T13:38:32.1222913Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1222972Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1223260Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1223347Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.1223578Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1223620Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1223847Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1223889Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1224113Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1224166Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1224389Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1224431Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1224653Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1224695Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1224914Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1224969Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1225191Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1225232Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1225453Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1225494Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1225642Z [rank1]:E1204 13:20:37.229000 399637 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1225807Z [rank1]:E1204 13:20:37.229000 399637 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1226108Z [rank1]:E1204 13:20:37.229000 399637 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1226266Z [rank1]:E1204 13:20:37.229000 399637 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1226554Z [rank1]:E1204 13:20:37.229000 399637 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1226678Z [rank1]:E1204 13:20:37.229000 399637 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1226972Z [rank1]:E1204 13:20:37.229000 399637 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1227123Z [rank1]:E1204 13:20:37.229000 399637 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1227401Z [rank1]:E1204 13:20:37.229000 399637 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1227558Z [rank1]:E1204 13:20:37.229000 399637 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1227837Z [rank1]:E1204 13:20:37.229000 399637 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1227976Z [rank1]:E1204 13:20:37.229000 399637 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1228255Z [rank1]:E1204 13:20:37.229000 399637 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1228417Z [rank1]:E1204 13:20:37.229000 399637 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1228897Z [rank1]:E1204 13:20:37.229000 399637 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 1. CUDA driver allocated memory was 2317352960 and is now 17483956224.
2025-12-04T13:38:32.1229014Z [rank1]:E1204 13:20:37.229000 399637 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1229222Z [rank1]:E1204 13:20:37.229000 399637 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1229741Z [rank1]:E1204 13:20:37.229000 399637 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda
2025-12-04T13:38:32.1229861Z [rank1]:E1204 13:20:37.229000 399637 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1230073Z [rank1]:E1204 13:20:37.229000 399637 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1230239Z [rank1]:E1204 13:20:37.229000 399637 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.1230279Z dist init r=1, world=4
2025-12-04T13:38:32.1230434Z [rank0]:E1204 13:20:37.289000 399636 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1230594Z [rank0]:E1204 13:20:37.289000 399636 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1230888Z [rank0]:E1204 13:20:37.289000 399636 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1231043Z [rank0]:E1204 13:20:37.289000 399636 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1231327Z [rank0]:E1204 13:20:37.289000 399636 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1231455Z [rank0]:E1204 13:20:37.289000 399636 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1231736Z [rank0]:E1204 13:20:37.289000 399636 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1231899Z [rank0]:E1204 13:20:37.289000 399636 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1232177Z [rank0]:E1204 13:20:37.289000 399636 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1232326Z [rank0]:E1204 13:20:37.289000 399636 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1232603Z [rank0]:E1204 13:20:37.289000 399636 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1232743Z [rank0]:E1204 13:20:37.289000 399636 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1233036Z [rank0]:E1204 13:20:37.289000 399636 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1233184Z [rank0]:E1204 13:20:37.289000 399636 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1233661Z [rank0]:E1204 13:20:37.289000 399636 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 0. CUDA driver allocated memory was 2453667840 and is now 17620271104.
2025-12-04T13:38:32.1233795Z [rank0]:E1204 13:20:37.289000 399636 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1233996Z [rank0]:E1204 13:20:37.289000 399636 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1234355Z [rank0]:E1204 13:20:37.289000 399636 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda
2025-12-04T13:38:32.1234467Z [rank0]:E1204 13:20:37.289000 399636 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1234682Z [rank0]:E1204 13:20:37.289000 399636 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1234857Z [rank0]:E1204 13:20:37.289000 399636 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.1234897Z dist init r=0, world=4
2025-12-04T13:38:32.1235034Z [rank3]:E1204 13:20:37.302000 399639 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1235197Z [rank3]:E1204 13:20:37.302000 399639 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1235483Z [rank3]:E1204 13:20:37.302000 399639 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1235637Z [rank3]:E1204 13:20:37.302000 399639 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1235924Z [rank3]:E1204 13:20:37.302000 399639 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1236048Z [rank3]:E1204 13:20:37.302000 399639 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1236340Z [rank3]:E1204 13:20:37.302000 399639 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1236487Z [rank3]:E1204 13:20:37.302000 399639 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1236767Z [rank3]:E1204 13:20:37.302000 399639 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1236916Z [rank3]:E1204 13:20:37.302000 399639 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1237204Z [rank3]:E1204 13:20:37.302000 399639 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1237344Z [rank3]:E1204 13:20:37.302000 399639 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1237620Z [rank3]:E1204 13:20:37.302000 399639 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1237769Z [rank3]:E1204 13:20:37.302000 399639 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1238260Z [rank3]:E1204 13:20:37.302000 399639 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 3. CUDA driver allocated memory was 2250244096 and is now 17416847360.
2025-12-04T13:38:32.1238377Z [rank3]:E1204 13:20:37.302000 399639 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1238573Z [rank3]:E1204 13:20:37.302000 399639 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1238935Z [rank3]:E1204 13:20:37.302000 399639 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda
2025-12-04T13:38:32.1239061Z [rank3]:E1204 13:20:37.302000 399639 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1239271Z [rank3]:E1204 13:20:37.302000 399639 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1239437Z [rank3]:E1204 13:20:37.302000 399639 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.1239476Z dist init r=3, world=4
2025-12-04T13:38:32.1239658Z [rank2]:E1204 13:20:37.313000 399638 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1239817Z [rank2]:E1204 13:20:37.313000 399638 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1240104Z [rank2]:E1204 13:20:37.313000 399638 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1240259Z [rank2]:E1204 13:20:37.313000 399638 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1240558Z [rank2]:E1204 13:20:37.313000 399638 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1240684Z [rank2]:E1204 13:20:37.313000 399638 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1240961Z [rank2]:E1204 13:20:37.313000 399638 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1241111Z [rank2]:E1204 13:20:37.313000 399638 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1241400Z [rank2]:E1204 13:20:37.313000 399638 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1241550Z [rank2]:E1204 13:20:37.313000 399638 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1241826Z [rank2]:E1204 13:20:37.313000 399638 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1241963Z [rank2]:E1204 13:20:37.313000 399638 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1242256Z [rank2]:E1204 13:20:37.313000 399638 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1242404Z [rank2]:E1204 13:20:37.313000 399638 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1242880Z [rank2]:E1204 13:20:37.313000 399638 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 2. CUDA driver allocated memory was 2300575744 and is now 17467179008.
2025-12-04T13:38:32.1242994Z [rank2]:E1204 13:20:37.313000 399638 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1243191Z [rank2]:E1204 13:20:37.313000 399638 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1243567Z [rank2]:E1204 13:20:37.313000 399638 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda
2025-12-04T13:38:32.1243684Z [rank2]:E1204 13:20:37.313000 399638 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1243897Z [rank2]:E1204 13:20:37.313000 399638 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1244059Z [rank2]:E1204 13:20:37.313000 399638 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.1244099Z dist init r=2, world=4
2025-12-04T13:38:32.1244435Z [rank1]:[W1204 13:20:37.400194084 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1244777Z [rank0]:[W1204 13:20:37.619990936 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1245105Z [rank2]:[W1204 13:20:37.666999629 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1245432Z [rank3]:[W1204 13:20:37.694596839 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1245476Z FAILED [22.9376s] [100%]
2025-12-04T13:38:32.1245478Z 
2025-12-04T13:38:32.1245534Z =================================== FAILURES ===================================
2025-12-04T13:38:32.1245649Z ___ TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda ___
2025-12-04T13:38:32.1245695Z Traceback (most recent call last):
2025-12-04T13:38:32.1245865Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.1245908Z     self._join_processes(fn)
2025-12-04T13:38:32.1246084Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.1246138Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.1246319Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.1246375Z     raise RuntimeError(error)
2025-12-04T13:38:32.1246457Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T13:38:32.1246501Z Traceback (most recent call last):
2025-12-04T13:38:32.1246665Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1246708Z     getattr(self, test_name)()
2025-12-04T13:38:32.1246871Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1246905Z     fn()
2025-12-04T13:38:32.1247059Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1247100Z     method(*args, **kwargs)
2025-12-04T13:38:32.1247254Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1247308Z     method(*args, **kwargs)
2025-12-04T13:38:32.1247460Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1247498Z     with policy():
2025-12-04T13:38:32.1247652Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1247696Z     raise RuntimeError(msg)
2025-12-04T13:38:32.1248049Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 1. CUDA driver allocated memory was 2317352960 and is now 17483956224.
2025-12-04T13:38:32.1248051Z 
2025-12-04T13:38:32.1248128Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1248364Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda
2025-12-04T13:38:32.1248367Z 
2025-12-04T13:38:32.1248455Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1248458Z 
2025-12-04T13:38:32.1248459Z 
2025-12-04T13:38:32.1248534Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.1248634Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.1248866Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-729675f2485b5e8a.xml -
2025-12-04T13:38:32.1248927Z =========================== short test summary info ============================
2025-12-04T13:38:32.1249174Z FAILED [22.9376s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_no_shard_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T13:38:32.1249220Z Traceback (most recent call last):
2025-12-04T13:38:32.1249387Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1249429Z     getattr(self, test_name)()
2025-12-04T13:38:32.1249643Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1249678Z     fn()
2025-12-04T13:38:32.1249831Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1249871Z     method(*args, **kwargs)
2025-12-04T13:38:32.1250023Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1250062Z     method(*args, **kwargs)
2025-12-04T13:38:32.1250229Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1250266Z     with policy():
2025-12-04T13:38:32.1250422Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1250463Z     raise RuntimeError(msg)
2025-12-04T13:38:32.1250821Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 1. CUDA driver allocated memory was 2317352960 and is now 17483956224.
2025-12-04T13:38:32.1250823Z 
2025-12-04T13:38:32.1250899Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1251130Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_no_shard_cuda
2025-12-04T13:38:32.1251146Z 
2025-12-04T13:38:32.1251234Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1251297Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.1251362Z ====================== 1 failed, 32 deselected in 23.10s =======================
2025-12-04T13:38:32.1251401Z Got exit code 1
2025-12-04T13:38:32.1251584Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_no_shard_cuda
2025-12-04T13:38:32.1251711Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T13:38:32.1251902Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5eacd8724e83b056.xml
2025-12-04T13:38:32.1251960Z ============================= test session starts ==============================
2025-12-04T13:38:32.1252076Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.1252119Z cachedir: .pytest_cache
2025-12-04T13:38:32.1252280Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.1252325Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.1252367Z configfile: pytest.ini
2025-12-04T13:38:32.1252545Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.1252621Z collecting ... collected 60 items / 15 deselected / 45 selected
2025-12-04T13:38:32.1252676Z stepcurrent: skipping 15 already run items.
2025-12-04T13:38:32.1252720Z Running 18 items in this shard
2025-12-04T13:38:32.1252722Z 
2025-12-04T13:38:32.1253025Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_none_cuda I1204 13:20:54.200000 400977 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 401046
2025-12-04T13:38:32.1253182Z I1204 13:20:54.200000 400977 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 401047
2025-12-04T13:38:32.1253346Z I1204 13:20:54.201000 400977 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 401048
2025-12-04T13:38:32.1253498Z I1204 13:20:54.202000 400977 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 401049
2025-12-04T13:38:32.1254080Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1254130Z   _warn_cpu_init()
2025-12-04T13:38:32.1254701Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1254739Z   _warn_cpu_init()
2025-12-04T13:38:32.1255038Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T13:38:32.1255077Z   _init_core_state(
2025-12-04T13:38:32.1255569Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1255644Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1255941Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T13:38:32.1255977Z   _init_core_state(
2025-12-04T13:38:32.1256468Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1256529Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1257115Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1257152Z   _warn_cpu_init()
2025-12-04T13:38:32.1257450Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T13:38:32.1257489Z   _init_core_state(
2025-12-04T13:38:32.1257988Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1258049Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1258617Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1258667Z   _warn_cpu_init()
2025-12-04T13:38:32.1259157Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1259214Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1259508Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T13:38:32.1259545Z   _init_core_state(
2025-12-04T13:38:32.1260073Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1260148Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1260439Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.1260481Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1260969Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1261029Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1261526Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1261585Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1261813Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1261858Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1262083Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1262126Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1262364Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1262405Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1262629Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1262669Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1262889Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1262942Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1263163Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1263202Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1263424Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1263464Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1263686Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1263726Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1263873Z [rank2]:E1204 13:21:02.954000 401048 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1264036Z [rank2]:E1204 13:21:02.954000 401048 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1264342Z [rank2]:E1204 13:21:02.954000 401048 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1264501Z [rank2]:E1204 13:21:02.954000 401048 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1264788Z [rank2]:E1204 13:21:02.954000 401048 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1264915Z [rank2]:E1204 13:21:02.954000 401048 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1265191Z [rank2]:E1204 13:21:02.954000 401048 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1265344Z [rank2]:E1204 13:21:02.954000 401048 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1265637Z [rank2]:E1204 13:21:02.954000 401048 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1265786Z [rank2]:E1204 13:21:02.954000 401048 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1266062Z [rank2]:E1204 13:21:02.954000 401048 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1266200Z [rank2]:E1204 13:21:02.954000 401048 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1266489Z [rank2]:E1204 13:21:02.954000 401048 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1266639Z [rank2]:E1204 13:21:02.954000 401048 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1267118Z [rank2]:E1204 13:21:02.954000 401048 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 2300575744 and is now 17467179008.
2025-12-04T13:38:32.1267235Z [rank2]:E1204 13:21:02.954000 401048 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1267442Z [rank2]:E1204 13:21:02.954000 401048 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1267799Z [rank2]:E1204 13:21:02.954000 401048 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda
2025-12-04T13:38:32.1267914Z [rank2]:E1204 13:21:02.954000 401048 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1268129Z [rank2]:E1204 13:21:02.954000 401048 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1268293Z [rank2]:E1204 13:21:02.954000 401048 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.1268345Z dist init r=2, world=4
2025-12-04T13:38:32.1268482Z [rank0]:E1204 13:21:02.963000 401046 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1268643Z [rank0]:E1204 13:21:02.963000 401046 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1268933Z [rank0]:E1204 13:21:02.963000 401046 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1269090Z [rank0]:E1204 13:21:02.963000 401046 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1269383Z [rank0]:E1204 13:21:02.963000 401046 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1269510Z [rank0]:E1204 13:21:02.963000 401046 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1269831Z [rank0]:E1204 13:21:02.963000 401046 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1269990Z [rank0]:E1204 13:21:02.963000 401046 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1270268Z [rank0]:E1204 13:21:02.963000 401046 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1270414Z [rank0]:E1204 13:21:02.963000 401046 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1270691Z [rank0]:E1204 13:21:02.963000 401046 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1270841Z [rank0]:E1204 13:21:02.963000 401046 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1271119Z [rank0]:E1204 13:21:02.963000 401046 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1271269Z [rank0]:E1204 13:21:02.963000 401046 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1271741Z [rank0]:E1204 13:21:02.963000 401046 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 2453667840 and is now 17620271104.
2025-12-04T13:38:32.1271870Z [rank0]:E1204 13:21:02.963000 401046 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1272066Z [rank0]:E1204 13:21:02.963000 401046 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1272417Z [rank0]:E1204 13:21:02.963000 401046 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda
2025-12-04T13:38:32.1272532Z [rank0]:E1204 13:21:02.963000 401046 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1272747Z [rank0]:E1204 13:21:02.963000 401046 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1272927Z [rank0]:E1204 13:21:02.963000 401046 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.1272965Z dist init r=0, world=4
2025-12-04T13:38:32.1273106Z [rank1]:E1204 13:21:02.964000 401047 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1273265Z [rank1]:E1204 13:21:02.964000 401047 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1273551Z [rank1]:E1204 13:21:02.964000 401047 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1273705Z [rank1]:E1204 13:21:02.964000 401047 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1273993Z [rank1]:E1204 13:21:02.964000 401047 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1274128Z [rank1]:E1204 13:21:02.964000 401047 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1274405Z [rank1]:E1204 13:21:02.964000 401047 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1274554Z [rank1]:E1204 13:21:02.964000 401047 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1274833Z [rank1]:E1204 13:21:02.964000 401047 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1274983Z [rank1]:E1204 13:21:02.964000 401047 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1275269Z [rank1]:E1204 13:21:02.964000 401047 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1275407Z [rank1]:E1204 13:21:02.964000 401047 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1275685Z [rank1]:E1204 13:21:02.964000 401047 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1275843Z [rank1]:E1204 13:21:02.964000 401047 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1276320Z [rank1]:E1204 13:21:02.964000 401047 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 2317352960 and is now 17483956224.
2025-12-04T13:38:32.1276434Z [rank1]:E1204 13:21:02.964000 401047 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1276633Z [rank1]:E1204 13:21:02.964000 401047 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1276985Z [rank1]:E1204 13:21:02.964000 401047 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda
2025-12-04T13:38:32.1277109Z [rank1]:E1204 13:21:02.964000 401047 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1277323Z [rank1]:E1204 13:21:02.964000 401047 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1277486Z [rank1]:E1204 13:21:02.964000 401047 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.1277525Z dist init r=1, world=4
2025-12-04T13:38:32.1277662Z [rank3]:E1204 13:21:02.966000 401049 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1277824Z [rank3]:E1204 13:21:02.966000 401049 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1278110Z [rank3]:E1204 13:21:02.966000 401049 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1278264Z [rank3]:E1204 13:21:02.966000 401049 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1278561Z [rank3]:E1204 13:21:02.966000 401049 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1278686Z [rank3]:E1204 13:21:02.966000 401049 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1278963Z [rank3]:E1204 13:21:02.966000 401049 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1279111Z [rank3]:E1204 13:21:02.966000 401049 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1279402Z [rank3]:E1204 13:21:02.966000 401049 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1279549Z [rank3]:E1204 13:21:02.966000 401049 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1279859Z [rank3]:E1204 13:21:02.966000 401049 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1279995Z [rank3]:E1204 13:21:02.966000 401049 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1280287Z [rank3]:E1204 13:21:02.966000 401049 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1280438Z [rank3]:E1204 13:21:02.966000 401049 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1280911Z [rank3]:E1204 13:21:02.966000 401049 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 2250244096 and is now 17416847360.
2025-12-04T13:38:32.1281026Z [rank3]:E1204 13:21:02.966000 401049 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1281223Z [rank3]:E1204 13:21:02.966000 401049 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1281594Z [rank3]:E1204 13:21:02.966000 401049 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda
2025-12-04T13:38:32.1281708Z [rank3]:E1204 13:21:02.966000 401049 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1281918Z [rank3]:E1204 13:21:02.966000 401049 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1282083Z [rank3]:E1204 13:21:02.966000 401049 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.1282122Z dist init r=3, world=4
2025-12-04T13:38:32.1282459Z [rank1]:[W1204 13:21:03.225081823 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1282801Z [rank2]:[W1204 13:21:03.237153496 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1283134Z [rank3]:[W1204 13:21:03.259860422 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1283465Z [rank0]:[W1204 13:21:03.263817071 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1283506Z FAILED [23.1379s] [  5%]
2025-12-04T13:38:32.1283509Z 
2025-12-04T13:38:32.1283582Z =================================== FAILURES ===================================
2025-12-04T13:38:32.1283682Z _____ TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda _____
2025-12-04T13:38:32.1283730Z Traceback (most recent call last):
2025-12-04T13:38:32.1283894Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.1283939Z     self._join_processes(fn)
2025-12-04T13:38:32.1284112Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.1284167Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.1284357Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.1284405Z     raise RuntimeError(error)
2025-12-04T13:38:32.1284483Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.1284531Z Traceback (most recent call last):
2025-12-04T13:38:32.1284693Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1284737Z     getattr(self, test_name)()
2025-12-04T13:38:32.1284896Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1284933Z     fn()
2025-12-04T13:38:32.1285085Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1285127Z     method(*args, **kwargs)
2025-12-04T13:38:32.1285283Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1285333Z     method(*args, **kwargs)
2025-12-04T13:38:32.1285486Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1285523Z     with policy():
2025-12-04T13:38:32.1285680Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1285720Z     raise RuntimeError(msg)
2025-12-04T13:38:32.1286070Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 2453667840 and is now 17620271104.
2025-12-04T13:38:32.1286073Z 
2025-12-04T13:38:32.1286148Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1286377Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda
2025-12-04T13:38:32.1286380Z 
2025-12-04T13:38:32.1286467Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1286470Z 
2025-12-04T13:38:32.1286474Z 
2025-12-04T13:38:32.1286560Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.1286651Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.1286885Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5eacd8724e83b056.xml -
2025-12-04T13:38:32.1286947Z =========================== short test summary info ============================
2025-12-04T13:38:32.1287187Z FAILED [23.1379s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_none_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.1287235Z Traceback (most recent call last):
2025-12-04T13:38:32.1287401Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1287455Z     getattr(self, test_name)()
2025-12-04T13:38:32.1287618Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1287655Z     fn()
2025-12-04T13:38:32.1287807Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1287848Z     method(*args, **kwargs)
2025-12-04T13:38:32.1288001Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1288054Z     method(*args, **kwargs)
2025-12-04T13:38:32.1288206Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1288245Z     with policy():
2025-12-04T13:38:32.1288399Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1288442Z     raise RuntimeError(msg)
2025-12-04T13:38:32.1288790Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 2453667840 and is now 17620271104.
2025-12-04T13:38:32.1288794Z 
2025-12-04T13:38:32.1288870Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1289097Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda
2025-12-04T13:38:32.1289111Z 
2025-12-04T13:38:32.1289197Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1289262Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.1289325Z ====================== 1 failed, 15 deselected in 23.30s =======================
2025-12-04T13:38:32.1289363Z Got exit code 1
2025-12-04T13:38:32.1289404Z Retrying single test...
2025-12-04T13:38:32.1289625Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3dfb0e9aaec2e3f1.xml
2025-12-04T13:38:32.1289683Z ============================= test session starts ==============================
2025-12-04T13:38:32.1289799Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.1289839Z cachedir: .pytest_cache
2025-12-04T13:38:32.1290000Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.1290047Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.1290090Z configfile: pytest.ini
2025-12-04T13:38:32.1290254Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.1290328Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.1290564Z stepcurrent: skipping 15 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_none_cuda
2025-12-04T13:38:32.1290610Z Running 1 items in this shard
2025-12-04T13:38:32.1290612Z 
2025-12-04T13:38:32.1290911Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_none_cuda I1204 13:21:19.868000 402387 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 402456
2025-12-04T13:38:32.1291067Z I1204 13:21:19.869000 402387 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 402457
2025-12-04T13:38:32.1291221Z I1204 13:21:19.869000 402387 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 402458
2025-12-04T13:38:32.1291385Z I1204 13:21:19.870000 402387 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 402459
2025-12-04T13:38:32.1291968Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1292020Z   _warn_cpu_init()
2025-12-04T13:38:32.1292318Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T13:38:32.1292357Z   _init_core_state(
2025-12-04T13:38:32.1292855Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1292918Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1293489Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1293542Z   _warn_cpu_init()
2025-12-04T13:38:32.1293839Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T13:38:32.1293879Z   _init_core_state(
2025-12-04T13:38:32.1294372Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1294435Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1295020Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1295058Z   _warn_cpu_init()
2025-12-04T13:38:32.1295358Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T13:38:32.1295399Z   _init_core_state(
2025-12-04T13:38:32.1295901Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1295965Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1296532Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1296595Z   _warn_cpu_init()
2025-12-04T13:38:32.1297087Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1297146Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1297635Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1297693Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1298005Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T13:38:32.1298043Z   _init_core_state(
2025-12-04T13:38:32.1298533Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1298594Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1298884Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.1298932Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1299430Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1299492Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1299751Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1299797Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1300023Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1300072Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1300296Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1300354Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1300581Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1300623Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1300845Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1300886Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1301109Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1301164Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1301388Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1301429Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1301653Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1301694Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1301843Z [rank2]:E1204 13:21:28.559000 402458 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1302008Z [rank2]:E1204 13:21:28.559000 402458 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1302315Z [rank2]:E1204 13:21:28.559000 402458 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1302479Z [rank2]:E1204 13:21:28.559000 402458 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1302770Z [rank2]:E1204 13:21:28.559000 402458 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1302901Z [rank2]:E1204 13:21:28.559000 402458 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1303181Z [rank2]:E1204 13:21:28.559000 402458 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1303337Z [rank2]:E1204 13:21:28.559000 402458 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1303630Z [rank2]:E1204 13:21:28.559000 402458 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1303783Z [rank2]:E1204 13:21:28.559000 402458 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1304061Z [rank2]:E1204 13:21:28.559000 402458 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1304198Z [rank2]:E1204 13:21:28.559000 402458 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1304481Z [rank2]:E1204 13:21:28.559000 402458 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1304639Z [rank2]:E1204 13:21:28.559000 402458 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1305120Z [rank2]:E1204 13:21:28.559000 402458 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 2300575744 and is now 17467179008.
2025-12-04T13:38:32.1305236Z [rank2]:E1204 13:21:28.559000 402458 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1305447Z [rank2]:E1204 13:21:28.559000 402458 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1305803Z [rank2]:E1204 13:21:28.559000 402458 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda
2025-12-04T13:38:32.1305919Z [rank2]:E1204 13:21:28.559000 402458 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1306135Z [rank2]:E1204 13:21:28.559000 402458 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1306301Z [rank2]:E1204 13:21:28.559000 402458 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.1306346Z dist init r=2, world=4
2025-12-04T13:38:32.1306495Z [rank3]:E1204 13:21:28.601000 402459 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1306658Z [rank3]:E1204 13:21:28.601000 402459 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1306946Z [rank3]:E1204 13:21:28.601000 402459 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1307104Z [rank3]:E1204 13:21:28.601000 402459 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1307392Z [rank3]:E1204 13:21:28.601000 402459 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1307518Z [rank3]:E1204 13:21:28.601000 402459 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1307797Z [rank3]:E1204 13:21:28.601000 402459 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1307957Z [rank3]:E1204 13:21:28.601000 402459 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1308235Z [rank3]:E1204 13:21:28.601000 402459 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1308382Z [rank3]:E1204 13:21:28.601000 402459 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1308667Z [rank3]:E1204 13:21:28.601000 402459 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1308808Z [rank3]:E1204 13:21:28.601000 402459 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1309097Z [rank3]:E1204 13:21:28.601000 402459 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1309249Z [rank3]:E1204 13:21:28.601000 402459 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1309759Z [rank3]:E1204 13:21:28.601000 402459 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 2250244096 and is now 17416847360.
2025-12-04T13:38:32.1309893Z [rank3]:E1204 13:21:28.601000 402459 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1310089Z [rank3]:E1204 13:21:28.601000 402459 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1310444Z [rank3]:E1204 13:21:28.601000 402459 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda
2025-12-04T13:38:32.1310563Z [rank3]:E1204 13:21:28.601000 402459 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1310776Z [rank3]:E1204 13:21:28.601000 402459 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1310958Z [rank3]:E1204 13:21:28.601000 402459 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.1310998Z dist init r=3, world=4
2025-12-04T13:38:32.1311141Z [rank0]:E1204 13:21:28.601000 402456 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1311300Z [rank0]:E1204 13:21:28.601000 402456 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1311590Z [rank0]:E1204 13:21:28.601000 402456 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1311744Z [rank0]:E1204 13:21:28.601000 402456 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1312035Z [rank0]:E1204 13:21:28.601000 402456 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1312165Z [rank0]:E1204 13:21:28.601000 402456 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1312454Z [rank0]:E1204 13:21:28.601000 402456 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1312605Z [rank0]:E1204 13:21:28.601000 402456 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1312880Z [rank0]:E1204 13:21:28.601000 402456 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1313033Z [rank0]:E1204 13:21:28.601000 402456 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1313328Z [rank0]:E1204 13:21:28.601000 402456 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1313469Z [rank0]:E1204 13:21:28.601000 402456 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1313752Z [rank0]:E1204 13:21:28.601000 402456 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1313902Z [rank0]:E1204 13:21:28.601000 402456 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1314388Z [rank0]:E1204 13:21:28.601000 402456 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 2453667840 and is now 17620271104.
2025-12-04T13:38:32.1314504Z [rank0]:E1204 13:21:28.601000 402456 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1314703Z [rank0]:E1204 13:21:28.601000 402456 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1315057Z [rank0]:E1204 13:21:28.601000 402456 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda
2025-12-04T13:38:32.1315181Z [rank0]:E1204 13:21:28.601000 402456 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1315400Z [rank0]:E1204 13:21:28.601000 402456 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1315564Z [rank0]:E1204 13:21:28.601000 402456 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.1315607Z dist init r=0, world=4
2025-12-04T13:38:32.1315745Z [rank1]:E1204 13:21:28.612000 402457 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1315909Z [rank1]:E1204 13:21:28.612000 402457 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1316196Z [rank1]:E1204 13:21:28.612000 402457 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1316353Z [rank1]:E1204 13:21:28.612000 402457 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1316663Z [rank1]:E1204 13:21:28.612000 402457 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1316788Z [rank1]:E1204 13:21:28.612000 402457 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1317069Z [rank1]:E1204 13:21:28.612000 402457 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1317218Z [rank1]:E1204 13:21:28.612000 402457 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1317510Z [rank1]:E1204 13:21:28.612000 402457 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1317658Z [rank1]:E1204 13:21:28.612000 402457 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1317936Z [rank1]:E1204 13:21:28.612000 402457 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1318072Z [rank1]:E1204 13:21:28.612000 402457 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1318363Z [rank1]:E1204 13:21:28.612000 402457 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1318516Z [rank1]:E1204 13:21:28.612000 402457 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1318990Z [rank1]:E1204 13:21:28.612000 402457 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 2317352960 and is now 17483956224.
2025-12-04T13:38:32.1319106Z [rank1]:E1204 13:21:28.612000 402457 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1319301Z [rank1]:E1204 13:21:28.612000 402457 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1319708Z [rank1]:E1204 13:21:28.612000 402457 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda
2025-12-04T13:38:32.1319828Z [rank1]:E1204 13:21:28.612000 402457 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1320039Z [rank1]:E1204 13:21:28.612000 402457 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1320206Z [rank1]:E1204 13:21:28.612000 402457 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.1320244Z dist init r=1, world=4
2025-12-04T13:38:32.1320582Z [rank2]:[W1204 13:21:28.725525673 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1320928Z [rank3]:[W1204 13:21:28.887556774 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1321260Z [rank1]:[W1204 13:21:28.894284017 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1321591Z [rank0]:[W1204 13:21:28.927860742 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1321634Z FAILED [22.8393s] [100%]
2025-12-04T13:38:32.1321637Z 
2025-12-04T13:38:32.1321696Z =================================== FAILURES ===================================
2025-12-04T13:38:32.1321808Z _____ TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda _____
2025-12-04T13:38:32.1321858Z Traceback (most recent call last):
2025-12-04T13:38:32.1322025Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.1322072Z     self._join_processes(fn)
2025-12-04T13:38:32.1322247Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.1322304Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.1322482Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.1322571Z     raise RuntimeError(error)
2025-12-04T13:38:32.1322650Z RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T13:38:32.1322699Z Traceback (most recent call last):
2025-12-04T13:38:32.1322861Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1322908Z     getattr(self, test_name)()
2025-12-04T13:38:32.1323067Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1323106Z     fn()
2025-12-04T13:38:32.1323258Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1323303Z     method(*args, **kwargs)
2025-12-04T13:38:32.1323459Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1323515Z     method(*args, **kwargs)
2025-12-04T13:38:32.1323672Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1323709Z     with policy():
2025-12-04T13:38:32.1323868Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1323910Z     raise RuntimeError(msg)
2025-12-04T13:38:32.1324261Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 2300575744 and is now 17467179008.
2025-12-04T13:38:32.1324264Z 
2025-12-04T13:38:32.1324340Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1324571Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda
2025-12-04T13:38:32.1324575Z 
2025-12-04T13:38:32.1324662Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1324664Z 
2025-12-04T13:38:32.1324669Z 
2025-12-04T13:38:32.1324746Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.1324849Z Process 2 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.1325087Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3dfb0e9aaec2e3f1.xml -
2025-12-04T13:38:32.1325152Z =========================== short test summary info ============================
2025-12-04T13:38:32.1325394Z FAILED [22.8393s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_none_cuda - RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T13:38:32.1325445Z Traceback (most recent call last):
2025-12-04T13:38:32.1325613Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1325658Z     getattr(self, test_name)()
2025-12-04T13:38:32.1325833Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1325872Z     fn()
2025-12-04T13:38:32.1326026Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1326071Z     method(*args, **kwargs)
2025-12-04T13:38:32.1326224Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1326267Z     method(*args, **kwargs)
2025-12-04T13:38:32.1326431Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1326472Z     with policy():
2025-12-04T13:38:32.1326625Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1326670Z     raise RuntimeError(msg)
2025-12-04T13:38:32.1327019Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 2300575744 and is now 17467179008.
2025-12-04T13:38:32.1327024Z 
2025-12-04T13:38:32.1327099Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1327329Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda
2025-12-04T13:38:32.1327333Z 
2025-12-04T13:38:32.1327430Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1327496Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.1327559Z ====================== 1 failed, 32 deselected in 23.00s =======================
2025-12-04T13:38:32.1327602Z Got exit code 1
2025-12-04T13:38:32.1327642Z Retrying single test...
2025-12-04T13:38:32.1327836Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-111f1001e4d61fdc.xml
2025-12-04T13:38:32.1327894Z ============================= test session starts ==============================
2025-12-04T13:38:32.1328011Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.1328053Z cachedir: .pytest_cache
2025-12-04T13:38:32.1328218Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.1328266Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.1328309Z configfile: pytest.ini
2025-12-04T13:38:32.1328474Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.1328553Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.1328790Z stepcurrent: skipping 15 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_none_cuda
2025-12-04T13:38:32.1328838Z Running 1 items in this shard
2025-12-04T13:38:32.1328841Z 
2025-12-04T13:38:32.1329144Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_none_cuda I1204 13:21:45.461000 403797 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 403866
2025-12-04T13:38:32.1329299Z I1204 13:21:45.462000 403797 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 403867
2025-12-04T13:38:32.1329456Z I1204 13:21:45.462000 403797 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 403868
2025-12-04T13:38:32.1329667Z I1204 13:21:45.463000 403797 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 403869
2025-12-04T13:38:32.1330251Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1330290Z   _warn_cpu_init()
2025-12-04T13:38:32.1330593Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T13:38:32.1330648Z   _init_core_state(
2025-12-04T13:38:32.1331145Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1331213Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1331789Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1331845Z   _warn_cpu_init()
2025-12-04T13:38:32.1332147Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T13:38:32.1332185Z   _init_core_state(
2025-12-04T13:38:32.1332681Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1332743Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1333335Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1333373Z   _warn_cpu_init()
2025-12-04T13:38:32.1333669Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T13:38:32.1333710Z   _init_core_state(
2025-12-04T13:38:32.1334199Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1334273Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1334846Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1334886Z   _warn_cpu_init()
2025-12-04T13:38:32.1335377Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1335448Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1335747Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T13:38:32.1335785Z   _init_core_state(
2025-12-04T13:38:32.1336275Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1336344Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1336832Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1336892Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1337184Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.1337232Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1337721Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1337792Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1338023Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1338070Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1338296Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1338342Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1338568Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1338609Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1338845Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1338886Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1341337Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1341381Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1341613Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1341673Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1341897Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1341938Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1342162Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1342202Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1342352Z [rank2]:E1204 13:21:54.271000 403868 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1342536Z [rank2]:E1204 13:21:54.271000 403868 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1342831Z [rank2]:E1204 13:21:54.271000 403868 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1343011Z [rank2]:E1204 13:21:54.271000 403868 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1343302Z [rank2]:E1204 13:21:54.271000 403868 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1343427Z [rank2]:E1204 13:21:54.271000 403868 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1343708Z [rank2]:E1204 13:21:54.271000 403868 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1343858Z [rank2]:E1204 13:21:54.271000 403868 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1344137Z [rank2]:E1204 13:21:54.271000 403868 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1344301Z [rank2]:E1204 13:21:54.271000 403868 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1344584Z [rank2]:E1204 13:21:54.271000 403868 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1344725Z [rank2]:E1204 13:21:54.271000 403868 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1345004Z [rank2]:E1204 13:21:54.271000 403868 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1345159Z [rank2]:E1204 13:21:54.271000 403868 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1345634Z [rank2]:E1204 13:21:54.271000 403868 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 2300575744 and is now 17467179008.
2025-12-04T13:38:32.1345842Z [rank2]:E1204 13:21:54.271000 403868 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1346043Z [rank2]:E1204 13:21:54.271000 403868 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1346421Z [rank2]:E1204 13:21:54.271000 403868 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda
2025-12-04T13:38:32.1346540Z [rank2]:E1204 13:21:54.271000 403868 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1346751Z [rank2]:E1204 13:21:54.271000 403868 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1346921Z [rank2]:E1204 13:21:54.271000 403868 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.1346960Z dist init r=2, world=4
2025-12-04T13:38:32.1347103Z [rank1]:E1204 13:21:54.331000 403867 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1347274Z [rank1]:E1204 13:21:54.331000 403867 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1347569Z [rank1]:E1204 13:21:54.331000 403867 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1347723Z [rank1]:E1204 13:21:54.331000 403867 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1348012Z [rank1]:E1204 13:21:54.331000 403867 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1348140Z [rank1]:E1204 13:21:54.331000 403867 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1348418Z [rank1]:E1204 13:21:54.331000 403867 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1348568Z [rank1]:E1204 13:21:54.331000 403867 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1348854Z [rank1]:E1204 13:21:54.331000 403867 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1349006Z [rank1]:E1204 13:21:54.331000 403867 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1349284Z [rank1]:E1204 13:21:54.331000 403867 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1349426Z [rank1]:E1204 13:21:54.331000 403867 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1349740Z [rank1]:E1204 13:21:54.331000 403867 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1349891Z [rank1]:E1204 13:21:54.331000 403867 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1350390Z [rank1]:E1204 13:21:54.331000 403867 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 2317352960 and is now 17483956224.
2025-12-04T13:38:32.1350522Z [rank1]:E1204 13:21:54.331000 403867 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1350721Z [rank1]:E1204 13:21:54.331000 403867 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1351073Z [rank1]:E1204 13:21:54.331000 403867 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda
2025-12-04T13:38:32.1351188Z [rank1]:E1204 13:21:54.331000 403867 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1351402Z [rank1]:E1204 13:21:54.331000 403867 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1351581Z [rank1]:E1204 13:21:54.331000 403867 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.1351625Z dist init r=1, world=4
2025-12-04T13:38:32.1351763Z [rank3]:E1204 13:21:54.333000 403869 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1351926Z [rank3]:E1204 13:21:54.333000 403869 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1352219Z [rank3]:E1204 13:21:54.333000 403869 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1352377Z [rank3]:E1204 13:21:54.333000 403869 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1352667Z [rank3]:E1204 13:21:54.333000 403869 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1352794Z [rank3]:E1204 13:21:54.333000 403869 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1353088Z [rank3]:E1204 13:21:54.333000 403869 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1353235Z [rank3]:E1204 13:21:54.333000 403869 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1353516Z [rank3]:E1204 13:21:54.333000 403869 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1353665Z [rank3]:E1204 13:21:54.333000 403869 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1353945Z [rank3]:E1204 13:21:54.333000 403869 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1354081Z [rank3]:E1204 13:21:54.333000 403869 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1354378Z [rank3]:E1204 13:21:54.333000 403869 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1354531Z [rank3]:E1204 13:21:54.333000 403869 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1355014Z [rank3]:E1204 13:21:54.333000 403869 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 2250244096 and is now 17416847360.
2025-12-04T13:38:32.1355134Z [rank3]:E1204 13:21:54.333000 403869 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1355329Z [rank3]:E1204 13:21:54.333000 403869 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1355687Z [rank3]:E1204 13:21:54.333000 403869 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda
2025-12-04T13:38:32.1355818Z [rank3]:E1204 13:21:54.333000 403869 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1356028Z [rank3]:E1204 13:21:54.333000 403869 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1356197Z [rank3]:E1204 13:21:54.333000 403869 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.1356236Z dist init r=3, world=4
2025-12-04T13:38:32.1356376Z [rank0]:E1204 13:21:54.338000 403866 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1356536Z [rank0]:E1204 13:21:54.338000 403866 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1356825Z [rank0]:E1204 13:21:54.338000 403866 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1356980Z [rank0]:E1204 13:21:54.338000 403866 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1357282Z [rank0]:E1204 13:21:54.338000 403866 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1357407Z [rank0]:E1204 13:21:54.338000 403866 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1357691Z [rank0]:E1204 13:21:54.338000 403866 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1357842Z [rank0]:E1204 13:21:54.338000 403866 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1358125Z [rank0]:E1204 13:21:54.338000 403866 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1358282Z [rank0]:E1204 13:21:54.338000 403866 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1358562Z [rank0]:E1204 13:21:54.338000 403866 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1358713Z [rank0]:E1204 13:21:54.338000 403866 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1363526Z [rank0]:E1204 13:21:54.338000 403866 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1363718Z [rank0]:E1204 13:21:54.338000 403866 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1364206Z [rank0]:E1204 13:21:54.338000 403866 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 2453667840 and is now 17620271104.
2025-12-04T13:38:32.1364330Z [rank0]:E1204 13:21:54.338000 403866 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1364526Z [rank0]:E1204 13:21:54.338000 403866 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1364907Z [rank0]:E1204 13:21:54.338000 403866 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda
2025-12-04T13:38:32.1365023Z [rank0]:E1204 13:21:54.338000 403866 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1365239Z [rank0]:E1204 13:21:54.338000 403866 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1365409Z [rank0]:E1204 13:21:54.338000 403866 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.1365451Z dist init r=0, world=4
2025-12-04T13:38:32.1365790Z [rank2]:[W1204 13:21:54.443312702 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1366125Z [rank0]:[W1204 13:21:54.563087262 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1366473Z [rank1]:[W1204 13:21:54.609955015 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1366805Z [rank3]:[W1204 13:21:54.614971930 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1366850Z FAILED [23.0354s] [100%]
2025-12-04T13:38:32.1366853Z 
2025-12-04T13:38:32.1366913Z =================================== FAILURES ===================================
2025-12-04T13:38:32.1367015Z _____ TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda _____
2025-12-04T13:38:32.1367062Z Traceback (most recent call last):
2025-12-04T13:38:32.1367237Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.1367281Z     self._join_processes(fn)
2025-12-04T13:38:32.1367461Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.1367544Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.1367724Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.1367784Z     raise RuntimeError(error)
2025-12-04T13:38:32.1367864Z RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T13:38:32.1367912Z Traceback (most recent call last):
2025-12-04T13:38:32.1368078Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1368123Z     getattr(self, test_name)()
2025-12-04T13:38:32.1368285Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1368323Z     fn()
2025-12-04T13:38:32.1368476Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1368521Z     method(*args, **kwargs)
2025-12-04T13:38:32.1368673Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1368718Z     method(*args, **kwargs)
2025-12-04T13:38:32.1368884Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1368924Z     with policy():
2025-12-04T13:38:32.1369079Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1369122Z     raise RuntimeError(msg)
2025-12-04T13:38:32.1369479Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 2300575744 and is now 17467179008.
2025-12-04T13:38:32.1369481Z 
2025-12-04T13:38:32.1369562Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1369829Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda
2025-12-04T13:38:32.1369833Z 
2025-12-04T13:38:32.1369922Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1369924Z 
2025-12-04T13:38:32.1369926Z 
2025-12-04T13:38:32.1370005Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.1370093Z Process 2 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.1370349Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-111f1001e4d61fdc.xml -
2025-12-04T13:38:32.1370411Z =========================== short test summary info ============================
2025-12-04T13:38:32.1370664Z FAILED [23.0354s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_none_cuda - RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T13:38:32.1370711Z Traceback (most recent call last):
2025-12-04T13:38:32.1370884Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1370925Z     getattr(self, test_name)()
2025-12-04T13:38:32.1371089Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1371125Z     fn()
2025-12-04T13:38:32.1371284Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1371326Z     method(*args, **kwargs)
2025-12-04T13:38:32.1371492Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1371535Z     method(*args, **kwargs)
2025-12-04T13:38:32.1371689Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1371744Z     with policy():
2025-12-04T13:38:32.1371897Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1371940Z     raise RuntimeError(msg)
2025-12-04T13:38:32.1372289Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 2300575744 and is now 17467179008.
2025-12-04T13:38:32.1372291Z 
2025-12-04T13:38:32.1372368Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1372596Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_none_cuda
2025-12-04T13:38:32.1372598Z 
2025-12-04T13:38:32.1372690Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1372767Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.1372833Z ====================== 1 failed, 32 deselected in 23.18s =======================
2025-12-04T13:38:32.1372870Z Got exit code 1
2025-12-04T13:38:32.1373050Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_none_cuda
2025-12-04T13:38:32.1373182Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T13:38:32.1373372Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-76607ebe359c7ec7.xml
2025-12-04T13:38:32.1373436Z ============================= test session starts ==============================
2025-12-04T13:38:32.1373552Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.1373597Z cachedir: .pytest_cache
2025-12-04T13:38:32.1373758Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.1373808Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.1373850Z configfile: pytest.ini
2025-12-04T13:38:32.1374020Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.1374106Z collecting ... collected 60 items / 16 deselected / 44 selected
2025-12-04T13:38:32.1374163Z stepcurrent: skipping 16 already run items.
2025-12-04T13:38:32.1374206Z Running 17 items in this shard
2025-12-04T13:38:32.1374208Z 
2025-12-04T13:38:32.1374528Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_shard_grad_op_cuda I1204 13:22:11.389000 405207 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 405276
2025-12-04T13:38:32.1374685Z I1204 13:22:11.390000 405207 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 405277
2025-12-04T13:38:32.1374841Z I1204 13:22:11.390000 405207 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 405278
2025-12-04T13:38:32.1374993Z I1204 13:22:11.391000 405207 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 405279
2025-12-04T13:38:32.1375601Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1375643Z   _warn_cpu_init()
2025-12-04T13:38:32.1375959Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1.
2025-12-04T13:38:32.1376000Z   _init_core_state(
2025-12-04T13:38:32.1376494Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1376561Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1377142Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1377191Z   _warn_cpu_init()
2025-12-04T13:38:32.1377497Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1.
2025-12-04T13:38:32.1377535Z   _init_core_state(
2025-12-04T13:38:32.1378028Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1378090Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1378684Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1378725Z   _warn_cpu_init()
2025-12-04T13:38:32.1379022Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1.
2025-12-04T13:38:32.1379062Z   _init_core_state(
2025-12-04T13:38:32.1379553Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1379647Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1380239Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1380289Z   _warn_cpu_init()
2025-12-04T13:38:32.1380777Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1380836Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1381327Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1381387Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1381684Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1.
2025-12-04T13:38:32.1381736Z   _init_core_state(
2025-12-04T13:38:32.1382227Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1382287Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1382581Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.1382628Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1383138Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1383199Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1383433Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1383476Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1383703Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1383746Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1383975Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1384016Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1384247Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1384288Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1384524Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1384564Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1384789Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1384841Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1385066Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1385108Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1385329Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1385373Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1385520Z [rank2]:E1204 13:22:20.214000 405278 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1385689Z [rank2]:E1204 13:22:20.214000 405278 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1385999Z [rank2]:E1204 13:22:20.214000 405278 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1386159Z [rank2]:E1204 13:22:20.214000 405278 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1386449Z [rank2]:E1204 13:22:20.214000 405278 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1386577Z [rank2]:E1204 13:22:20.214000 405278 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1386864Z [rank2]:E1204 13:22:20.214000 405278 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1387017Z [rank2]:E1204 13:22:20.214000 405278 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1387297Z [rank2]:E1204 13:22:20.214000 405278 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1387461Z [rank2]:E1204 13:22:20.214000 405278 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1387739Z [rank2]:E1204 13:22:20.214000 405278 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1387881Z [rank2]:E1204 13:22:20.214000 405278 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1388166Z [rank2]:E1204 13:22:20.214000 405278 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1388317Z [rank2]:E1204 13:22:20.214000 405278 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1388815Z [rank2]:E1204 13:22:20.214000 405278 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 2300575744 and is now 17467179008.
2025-12-04T13:38:32.1388935Z [rank2]:E1204 13:22:20.214000 405278 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1389143Z [rank2]:E1204 13:22:20.214000 405278 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1389516Z [rank2]:E1204 13:22:20.214000 405278 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.1389671Z [rank2]:E1204 13:22:20.214000 405278 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1389888Z [rank2]:E1204 13:22:20.214000 405278 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1390057Z [rank2]:E1204 13:22:20.214000 405278 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.1390096Z dist init r=2, world=4
2025-12-04T13:38:32.1390252Z [rank3]:E1204 13:22:20.233000 405279 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1390412Z [rank3]:E1204 13:22:20.233000 405279 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1390705Z [rank3]:E1204 13:22:20.233000 405279 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1390860Z [rank3]:E1204 13:22:20.233000 405279 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1391150Z [rank3]:E1204 13:22:20.233000 405279 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1391276Z [rank3]:E1204 13:22:20.233000 405279 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1391560Z [rank3]:E1204 13:22:20.233000 405279 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1391723Z [rank3]:E1204 13:22:20.233000 405279 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1392001Z [rank3]:E1204 13:22:20.233000 405279 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1392149Z [rank3]:E1204 13:22:20.233000 405279 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1392426Z [rank3]:E1204 13:22:20.233000 405279 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1392567Z [rank3]:E1204 13:22:20.233000 405279 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1392847Z [rank3]:E1204 13:22:20.233000 405279 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1392999Z [rank3]:E1204 13:22:20.233000 405279 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1393499Z [rank3]:E1204 13:22:20.233000 405279 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 2250244096 and is now 17416847360.
2025-12-04T13:38:32.1393630Z [rank3]:E1204 13:22:20.233000 405279 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1393829Z [rank3]:E1204 13:22:20.233000 405279 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1394192Z [rank3]:E1204 13:22:20.233000 405279 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.1394307Z [rank3]:E1204 13:22:20.233000 405279 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1394519Z [rank3]:E1204 13:22:20.233000 405279 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1394698Z [rank3]:E1204 13:22:20.233000 405279 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.1394739Z dist init r=3, world=4
2025-12-04T13:38:32.1394878Z [rank0]:E1204 13:22:20.243000 405276 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1395039Z [rank0]:E1204 13:22:20.243000 405276 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1395326Z [rank0]:E1204 13:22:20.243000 405276 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1395483Z [rank0]:E1204 13:22:20.243000 405276 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1395771Z [rank0]:E1204 13:22:20.243000 405276 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1395896Z [rank0]:E1204 13:22:20.243000 405276 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1396183Z [rank0]:E1204 13:22:20.243000 405276 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1396335Z [rank0]:E1204 13:22:20.243000 405276 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1396615Z [rank0]:E1204 13:22:20.243000 405276 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1396764Z [rank0]:E1204 13:22:20.243000 405276 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1397047Z [rank0]:E1204 13:22:20.243000 405276 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1397184Z [rank0]:E1204 13:22:20.243000 405276 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1397476Z [rank0]:E1204 13:22:20.243000 405276 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1397624Z [rank0]:E1204 13:22:20.243000 405276 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1398124Z [rank0]:E1204 13:22:20.243000 405276 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 2453667840 and is now 17620271104.
2025-12-04T13:38:32.1398242Z [rank0]:E1204 13:22:20.243000 405276 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1398438Z [rank0]:E1204 13:22:20.243000 405276 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1398802Z [rank0]:E1204 13:22:20.243000 405276 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.1398932Z [rank0]:E1204 13:22:20.243000 405276 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1399147Z [rank0]:E1204 13:22:20.243000 405276 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1399314Z [rank0]:E1204 13:22:20.243000 405276 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.1399355Z dist init r=0, world=4
2025-12-04T13:38:32.1399494Z [rank1]:E1204 13:22:20.251000 405277 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1399683Z [rank1]:E1204 13:22:20.251000 405277 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1399973Z [rank1]:E1204 13:22:20.251000 405277 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1400127Z [rank1]:E1204 13:22:20.251000 405277 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1400428Z [rank1]:E1204 13:22:20.251000 405277 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1400551Z [rank1]:E1204 13:22:20.251000 405277 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1400830Z [rank1]:E1204 13:22:20.251000 405277 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1400979Z [rank1]:E1204 13:22:20.251000 405277 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1401258Z [rank1]:E1204 13:22:20.251000 405277 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1401407Z [rank1]:E1204 13:22:20.251000 405277 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1401696Z [rank1]:E1204 13:22:20.251000 405277 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1401834Z [rank1]:E1204 13:22:20.251000 405277 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1402125Z [rank1]:E1204 13:22:20.251000 405277 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1402276Z [rank1]:E1204 13:22:20.251000 405277 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1402759Z [rank1]:E1204 13:22:20.251000 405277 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 2317352960 and is now 17483956224.
2025-12-04T13:38:32.1402877Z [rank1]:E1204 13:22:20.251000 405277 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1403074Z [rank1]:E1204 13:22:20.251000 405277 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1403447Z [rank1]:E1204 13:22:20.251000 405277 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.1403563Z [rank1]:E1204 13:22:20.251000 405277 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1403774Z [rank1]:E1204 13:22:20.251000 405277 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1403940Z [rank1]:E1204 13:22:20.251000 405277 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.1403980Z dist init r=1, world=4
2025-12-04T13:38:32.1404320Z [rank2]:[W1204 13:22:20.375098828 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1404663Z [rank3]:[W1204 13:22:20.418451687 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1404991Z [rank1]:[W1204 13:22:20.511303066 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1405319Z [rank0]:[W1204 13:22:20.514677593 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1405361Z FAILED [23.2380s] [  5%]
2025-12-04T13:38:32.1405363Z 
2025-12-04T13:38:32.1405424Z =================================== FAILURES ===================================
2025-12-04T13:38:32.1405526Z _ TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda _
2025-12-04T13:38:32.1405576Z Traceback (most recent call last):
2025-12-04T13:38:32.1405743Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.1405786Z     self._join_processes(fn)
2025-12-04T13:38:32.1405973Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.1406028Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.1406222Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.1406267Z     raise RuntimeError(error)
2025-12-04T13:38:32.1406348Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.1406393Z Traceback (most recent call last):
2025-12-04T13:38:32.1406559Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1406600Z     getattr(self, test_name)()
2025-12-04T13:38:32.1406762Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1406799Z     fn()
2025-12-04T13:38:32.1406954Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1406995Z     method(*args, **kwargs)
2025-12-04T13:38:32.1407153Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1407204Z     method(*args, **kwargs)
2025-12-04T13:38:32.1407359Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1407398Z     with policy():
2025-12-04T13:38:32.1407555Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1407595Z     raise RuntimeError(msg)
2025-12-04T13:38:32.1407958Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 2453667840 and is now 17620271104.
2025-12-04T13:38:32.1407961Z 
2025-12-04T13:38:32.1408042Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1408283Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.1408286Z 
2025-12-04T13:38:32.1408375Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1408377Z 
2025-12-04T13:38:32.1408436Z Process 2 exited with error code 10 and exception:
2025-12-04T13:38:32.1408493Z Traceback (most recent call last):
2025-12-04T13:38:32.1408657Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1408701Z     getattr(self, test_name)()
2025-12-04T13:38:32.1408861Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1408899Z     fn()
2025-12-04T13:38:32.1409049Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1409094Z     method(*args, **kwargs)
2025-12-04T13:38:32.1409244Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1409288Z     method(*args, **kwargs)
2025-12-04T13:38:32.1409438Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1409478Z     with policy():
2025-12-04T13:38:32.1409680Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1409725Z     raise RuntimeError(msg)
2025-12-04T13:38:32.1410100Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 2300575744 and is now 17467179008.
2025-12-04T13:38:32.1410116Z 
2025-12-04T13:38:32.1410191Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1410429Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.1410432Z 
2025-12-04T13:38:32.1410519Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1410521Z 
2025-12-04T13:38:32.1410582Z Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.1410630Z Traceback (most recent call last):
2025-12-04T13:38:32.1410798Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1410841Z     getattr(self, test_name)()
2025-12-04T13:38:32.1411005Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1411056Z     fn()
2025-12-04T13:38:32.1411210Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1411250Z     method(*args, **kwargs)
2025-12-04T13:38:32.1411403Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1411443Z     method(*args, **kwargs)
2025-12-04T13:38:32.1411597Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1411636Z     with policy():
2025-12-04T13:38:32.1411790Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1411834Z     raise RuntimeError(msg)
2025-12-04T13:38:32.1412193Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 2250244096 and is now 17416847360.
2025-12-04T13:38:32.1412197Z 
2025-12-04T13:38:32.1412272Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1412522Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.1412524Z 
2025-12-04T13:38:32.1412615Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1412617Z 
2025-12-04T13:38:32.1412619Z 
2025-12-04T13:38:32.1412696Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.1412785Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.1413026Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-76607ebe359c7ec7.xml -
2025-12-04T13:38:32.1413088Z =========================== short test summary info ============================
2025-12-04T13:38:32.1413346Z FAILED [23.2380s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_shard_grad_op_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.1413393Z Traceback (most recent call last):
2025-12-04T13:38:32.1413560Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1413602Z     getattr(self, test_name)()
2025-12-04T13:38:32.1413781Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1413818Z     fn()
2025-12-04T13:38:32.1413994Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1414036Z     method(*args, **kwargs)
2025-12-04T13:38:32.1414191Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1414230Z     method(*args, **kwargs)
2025-12-04T13:38:32.1414385Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1414421Z     with policy():
2025-12-04T13:38:32.1414574Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1414615Z     raise RuntimeError(msg)
2025-12-04T13:38:32.1414976Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 2453667840 and is now 17620271104.
2025-12-04T13:38:32.1414990Z 
2025-12-04T13:38:32.1415067Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1415300Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.1415302Z 
2025-12-04T13:38:32.1415391Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1415393Z 
2025-12-04T13:38:32.1415452Z Process 2 exited with error code 10 and exception:
2025-12-04T13:38:32.1415499Z Traceback (most recent call last):
2025-12-04T13:38:32.1415665Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1415708Z     getattr(self, test_name)()
2025-12-04T13:38:32.1415870Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1415908Z     fn()
2025-12-04T13:38:32.1416059Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1416102Z     method(*args, **kwargs)
2025-12-04T13:38:32.1416262Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1416305Z     method(*args, **kwargs)
2025-12-04T13:38:32.1416455Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1416494Z     with policy():
2025-12-04T13:38:32.1416646Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1416691Z     raise RuntimeError(msg)
2025-12-04T13:38:32.1417048Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 2300575744 and is now 17467179008.
2025-12-04T13:38:32.1417051Z 
2025-12-04T13:38:32.1417124Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1417362Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.1417365Z 
2025-12-04T13:38:32.1417451Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1417453Z 
2025-12-04T13:38:32.1417532Z Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.1417583Z Traceback (most recent call last):
2025-12-04T13:38:32.1417746Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1417806Z     getattr(self, test_name)()
2025-12-04T13:38:32.1417967Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1418006Z     fn()
2025-12-04T13:38:32.1418157Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1418203Z     method(*args, **kwargs)
2025-12-04T13:38:32.1418355Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1418399Z     method(*args, **kwargs)
2025-12-04T13:38:32.1418551Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1418592Z     with policy():
2025-12-04T13:38:32.1418744Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1418799Z     raise RuntimeError(msg)
2025-12-04T13:38:32.1419154Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 2250244096 and is now 17416847360.
2025-12-04T13:38:32.1419160Z 
2025-12-04T13:38:32.1419234Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1419472Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.1419475Z 
2025-12-04T13:38:32.1419561Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1419672Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.1419738Z ====================== 1 failed, 16 deselected in 23.38s =======================
2025-12-04T13:38:32.1419780Z Got exit code 1
2025-12-04T13:38:32.1419821Z Retrying single test...
2025-12-04T13:38:32.1420014Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-177888eb9ff038f0.xml
2025-12-04T13:38:32.1420088Z ============================= test session starts ==============================
2025-12-04T13:38:32.1420207Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.1420249Z cachedir: .pytest_cache
2025-12-04T13:38:32.1420413Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.1420461Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.1420505Z configfile: pytest.ini
2025-12-04T13:38:32.1420671Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.1420752Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.1420982Z stepcurrent: skipping 16 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.1421031Z Running 1 items in this shard
2025-12-04T13:38:32.1421033Z 
2025-12-04T13:38:32.1421351Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_shard_grad_op_cuda I1204 13:22:37.364000 406617 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 406686
2025-12-04T13:38:32.1421522Z I1204 13:22:37.365000 406617 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 406687
2025-12-04T13:38:32.1421680Z I1204 13:22:37.366000 406617 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 406688
2025-12-04T13:38:32.1421853Z I1204 13:22:37.366000 406617 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 406689
2025-12-04T13:38:32.1422439Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1422479Z   _warn_cpu_init()
2025-12-04T13:38:32.1422786Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1.
2025-12-04T13:38:32.1422829Z   _init_core_state(
2025-12-04T13:38:32.1423342Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1423409Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1423983Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1424027Z   _warn_cpu_init()
2025-12-04T13:38:32.1424331Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1.
2025-12-04T13:38:32.1424369Z   _init_core_state(
2025-12-04T13:38:32.1424876Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1424939Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1425515Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1425555Z   _warn_cpu_init()
2025-12-04T13:38:32.1425859Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1.
2025-12-04T13:38:32.1425900Z   _init_core_state(
2025-12-04T13:38:32.1426395Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1426468Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1427034Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1427075Z   _warn_cpu_init()
2025-12-04T13:38:32.1427568Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1427638Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1427942Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1.
2025-12-04T13:38:32.1427979Z   _init_core_state(
2025-12-04T13:38:32.1428471Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1428529Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1428823Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.1428866Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1429363Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1429427Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1429945Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1430009Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1430242Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1430288Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1430540Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1430587Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1430812Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1430868Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1431093Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1431134Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1431357Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1431398Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1431622Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1431663Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1431887Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1431943Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1432166Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1432208Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1432358Z [rank2]:E1204 13:22:46.073000 406688 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1432524Z [rank2]:E1204 13:22:46.073000 406688 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1432818Z [rank2]:E1204 13:22:46.073000 406688 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1432979Z [rank2]:E1204 13:22:46.073000 406688 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1433265Z [rank2]:E1204 13:22:46.073000 406688 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1433408Z [rank2]:E1204 13:22:46.073000 406688 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1433687Z [rank2]:E1204 13:22:46.073000 406688 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1433838Z [rank2]:E1204 13:22:46.073000 406688 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1434116Z [rank2]:E1204 13:22:46.073000 406688 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1434267Z [rank2]:E1204 13:22:46.073000 406688 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1434545Z [rank2]:E1204 13:22:46.073000 406688 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1434683Z [rank2]:E1204 13:22:46.073000 406688 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1434976Z [rank2]:E1204 13:22:46.073000 406688 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1435137Z [rank2]:E1204 13:22:46.073000 406688 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1435626Z [rank2]:E1204 13:22:46.073000 406688 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 2300575744 and is now 17467179008.
2025-12-04T13:38:32.1435741Z [rank2]:E1204 13:22:46.073000 406688 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1435941Z [rank2]:E1204 13:22:46.073000 406688 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1436310Z [rank2]:E1204 13:22:46.073000 406688 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.1436436Z [rank2]:E1204 13:22:46.073000 406688 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1436654Z [rank2]:E1204 13:22:46.073000 406688 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1436820Z [rank2]:E1204 13:22:46.073000 406688 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.1436863Z dist init r=2, world=4
2025-12-04T13:38:32.1437002Z [rank0]:E1204 13:22:46.097000 406686 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1437168Z [rank0]:E1204 13:22:46.097000 406686 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1437457Z [rank0]:E1204 13:22:46.097000 406686 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1437626Z [rank0]:E1204 13:22:46.097000 406686 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1437913Z [rank0]:E1204 13:22:46.097000 406686 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1438040Z [rank0]:E1204 13:22:46.097000 406686 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1438319Z [rank0]:E1204 13:22:46.097000 406686 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1438471Z [rank0]:E1204 13:22:46.097000 406686 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1438752Z [rank0]:E1204 13:22:46.097000 406686 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1438899Z [rank0]:E1204 13:22:46.097000 406686 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1439189Z [rank0]:E1204 13:22:46.097000 406686 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1439340Z [rank0]:E1204 13:22:46.097000 406686 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1439649Z [rank0]:E1204 13:22:46.097000 406686 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1439803Z [rank0]:E1204 13:22:46.097000 406686 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1440288Z [rank0]:E1204 13:22:46.097000 406686 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 2453667840 and is now 17620271104.
2025-12-04T13:38:32.1440407Z [rank0]:E1204 13:22:46.097000 406686 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1440616Z [rank0]:E1204 13:22:46.097000 406686 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1440984Z [rank0]:E1204 13:22:46.097000 406686 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.1441101Z [rank0]:E1204 13:22:46.097000 406686 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1441314Z [rank0]:E1204 13:22:46.097000 406686 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1441483Z [rank0]:E1204 13:22:46.097000 406686 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.1441524Z dist init r=0, world=4
2025-12-04T13:38:32.1441664Z [rank1]:E1204 13:22:46.136000 406687 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1441825Z [rank1]:E1204 13:22:46.136000 406687 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1442130Z [rank1]:E1204 13:22:46.136000 406687 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1442285Z [rank1]:E1204 13:22:46.136000 406687 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1442573Z [rank1]:E1204 13:22:46.136000 406687 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1442702Z [rank1]:E1204 13:22:46.136000 406687 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1442980Z [rank1]:E1204 13:22:46.136000 406687 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1443132Z [rank1]:E1204 13:22:46.136000 406687 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1443423Z [rank1]:E1204 13:22:46.136000 406687 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1443573Z [rank1]:E1204 13:22:46.136000 406687 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1443863Z [rank1]:E1204 13:22:46.136000 406687 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1444004Z [rank1]:E1204 13:22:46.136000 406687 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1444287Z [rank1]:E1204 13:22:46.136000 406687 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1444436Z [rank1]:E1204 13:22:46.136000 406687 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1444919Z [rank1]:E1204 13:22:46.136000 406687 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 2317352960 and is now 17483956224.
2025-12-04T13:38:32.1445044Z [rank1]:E1204 13:22:46.136000 406687 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1445244Z [rank1]:E1204 13:22:46.136000 406687 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1445613Z [rank1]:E1204 13:22:46.136000 406687 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.1445726Z [rank1]:E1204 13:22:46.136000 406687 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1445944Z [rank1]:E1204 13:22:46.136000 406687 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1446110Z [rank1]:E1204 13:22:46.136000 406687 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.1446152Z dist init r=1, world=4
2025-12-04T13:38:32.1446312Z [rank3]:E1204 13:22:46.156000 406689 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1446477Z [rank3]:E1204 13:22:46.156000 406689 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1446766Z [rank3]:E1204 13:22:46.156000 406689 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1446923Z [rank3]:E1204 13:22:46.156000 406689 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1447210Z [rank3]:E1204 13:22:46.156000 406689 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1447335Z [rank3]:E1204 13:22:46.156000 406689 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1447613Z [rank3]:E1204 13:22:46.156000 406689 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1447771Z [rank3]:E1204 13:22:46.156000 406689 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1448060Z [rank3]:E1204 13:22:46.156000 406689 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1448208Z [rank3]:E1204 13:22:46.156000 406689 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1448486Z [rank3]:E1204 13:22:46.156000 406689 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1448622Z [rank3]:E1204 13:22:46.156000 406689 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1448909Z [rank3]:E1204 13:22:46.156000 406689 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1449074Z [rank3]:E1204 13:22:46.156000 406689 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1449555Z [rank3]:E1204 13:22:46.156000 406689 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 2250244096 and is now 17416847360.
2025-12-04T13:38:32.1449711Z [rank3]:E1204 13:22:46.156000 406689 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1449908Z [rank3]:E1204 13:22:46.156000 406689 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1450275Z [rank3]:E1204 13:22:46.156000 406689 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.1450393Z [rank3]:E1204 13:22:46.156000 406689 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1450618Z [rank3]:E1204 13:22:46.156000 406689 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1450786Z [rank3]:E1204 13:22:46.156000 406689 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.1450825Z dist init r=3, world=4
2025-12-04T13:38:32.1451165Z [rank2]:[W1204 13:22:46.248926659 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1451500Z [rank0]:[W1204 13:22:46.329820663 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1451831Z [rank1]:[W1204 13:22:46.412140739 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1452172Z [rank3]:[W1204 13:22:46.425487487 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1452213Z FAILED [23.0390s] [100%]
2025-12-04T13:38:32.1452228Z 
2025-12-04T13:38:32.1452289Z =================================== FAILURES ===================================
2025-12-04T13:38:32.1452393Z _ TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda _
2025-12-04T13:38:32.1452443Z Traceback (most recent call last):
2025-12-04T13:38:32.1452609Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.1452656Z     self._join_processes(fn)
2025-12-04T13:38:32.1452831Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.1452889Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.1453069Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.1453119Z     raise RuntimeError(error)
2025-12-04T13:38:32.1453200Z RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T13:38:32.1453263Z Traceback (most recent call last):
2025-12-04T13:38:32.1453425Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1453472Z     getattr(self, test_name)()
2025-12-04T13:38:32.1453633Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1453672Z     fn()
2025-12-04T13:38:32.1453825Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1453869Z     method(*args, **kwargs)
2025-12-04T13:38:32.1454027Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1454069Z     method(*args, **kwargs)
2025-12-04T13:38:32.1454223Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1454263Z     with policy():
2025-12-04T13:38:32.1454420Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1454462Z     raise RuntimeError(msg)
2025-12-04T13:38:32.1454833Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 2300575744 and is now 17467179008.
2025-12-04T13:38:32.1454836Z 
2025-12-04T13:38:32.1454912Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1455155Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.1455158Z 
2025-12-04T13:38:32.1455248Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1455250Z 
2025-12-04T13:38:32.1455255Z 
2025-12-04T13:38:32.1455331Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.1455422Z Process 2 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.1455658Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-177888eb9ff038f0.xml -
2025-12-04T13:38:32.1455722Z =========================== short test summary info ============================
2025-12-04T13:38:32.1455985Z FAILED [23.0390s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_shard_grad_op_cuda - RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T13:38:32.1456035Z Traceback (most recent call last):
2025-12-04T13:38:32.1456212Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1456259Z     getattr(self, test_name)()
2025-12-04T13:38:32.1456422Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1456461Z     fn()
2025-12-04T13:38:32.1456615Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1456658Z     method(*args, **kwargs)
2025-12-04T13:38:32.1456810Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1456853Z     method(*args, **kwargs)
2025-12-04T13:38:32.1457005Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1457047Z     with policy():
2025-12-04T13:38:32.1457199Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1457255Z     raise RuntimeError(msg)
2025-12-04T13:38:32.1457616Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 2300575744 and is now 17467179008.
2025-12-04T13:38:32.1457618Z 
2025-12-04T13:38:32.1457693Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1457933Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.1457936Z 
2025-12-04T13:38:32.1458023Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1458091Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.1458155Z ====================== 1 failed, 32 deselected in 23.20s =======================
2025-12-04T13:38:32.1458195Z Got exit code 1
2025-12-04T13:38:32.1458236Z Retrying single test...
2025-12-04T13:38:32.1458430Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-16768b6b4b00c7ee.xml
2025-12-04T13:38:32.1458498Z ============================= test session starts ==============================
2025-12-04T13:38:32.1458617Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.1458659Z cachedir: .pytest_cache
2025-12-04T13:38:32.1458820Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.1458866Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.1458912Z configfile: pytest.ini
2025-12-04T13:38:32.1459076Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.1459154Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.1459385Z stepcurrent: skipping 16 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.1459431Z Running 1 items in this shard
2025-12-04T13:38:32.1459433Z 
2025-12-04T13:38:32.1459794Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_shard_grad_op_cuda I1204 13:23:03.068000 408027 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 408096
2025-12-04T13:38:32.1459954Z I1204 13:23:03.069000 408027 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 408097
2025-12-04T13:38:32.1460125Z I1204 13:23:03.069000 408027 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 408098
2025-12-04T13:38:32.1460278Z I1204 13:23:03.070000 408027 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 408099
2025-12-04T13:38:32.1460871Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1460917Z   _warn_cpu_init()
2025-12-04T13:38:32.1461222Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1.
2025-12-04T13:38:32.1461285Z   _init_core_state(
2025-12-04T13:38:32.1461778Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1461843Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1462418Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1462463Z   _warn_cpu_init()
2025-12-04T13:38:32.1462767Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1.
2025-12-04T13:38:32.1462804Z   _init_core_state(
2025-12-04T13:38:32.1463309Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1463373Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1463948Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1463990Z   _warn_cpu_init()
2025-12-04T13:38:32.1464288Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1.
2025-12-04T13:38:32.1464330Z   _init_core_state(
2025-12-04T13:38:32.1464831Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1464904Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1465477Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1465519Z   _warn_cpu_init()
2025-12-04T13:38:32.1466012Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1466084Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1466572Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1466630Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1467120Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1467182Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1467495Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1.
2025-12-04T13:38:32.1467537Z   _init_core_state(
2025-12-04T13:38:32.1468024Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1468088Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1468378Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.1468426Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1468656Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1468702Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1468940Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1468982Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1469219Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1469261Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1469485Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1469527Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1469786Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1469827Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1470057Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1470100Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1470339Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1470380Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1470602Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1470645Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1470795Z [rank1]:E1204 13:23:11.843000 408097 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1470963Z [rank1]:E1204 13:23:11.843000 408097 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1471255Z [rank1]:E1204 13:23:11.843000 408097 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1471417Z [rank1]:E1204 13:23:11.843000 408097 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1471719Z [rank1]:E1204 13:23:11.843000 408097 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1471849Z [rank1]:E1204 13:23:11.843000 408097 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1472129Z [rank1]:E1204 13:23:11.843000 408097 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1472281Z [rank1]:E1204 13:23:11.843000 408097 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1472560Z [rank1]:E1204 13:23:11.843000 408097 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1472712Z [rank1]:E1204 13:23:11.843000 408097 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1472993Z [rank1]:E1204 13:23:11.843000 408097 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1473143Z [rank1]:E1204 13:23:11.843000 408097 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1473424Z [rank1]:E1204 13:23:11.843000 408097 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1473588Z [rank1]:E1204 13:23:11.843000 408097 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1474075Z [rank1]:E1204 13:23:11.843000 408097 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 2317352960 and is now 17483956224.
2025-12-04T13:38:32.1474196Z [rank1]:E1204 13:23:11.843000 408097 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1474392Z [rank1]:E1204 13:23:11.843000 408097 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1474773Z [rank1]:E1204 13:23:11.843000 408097 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.1474887Z [rank1]:E1204 13:23:11.843000 408097 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1475102Z [rank1]:E1204 13:23:11.843000 408097 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1475267Z [rank1]:E1204 13:23:11.843000 408097 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.1475311Z dist init r=1, world=4
2025-12-04T13:38:32.1475447Z [rank3]:E1204 13:23:11.894000 408099 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1475613Z [rank3]:E1204 13:23:11.894000 408099 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1475902Z [rank3]:E1204 13:23:11.894000 408099 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1476066Z [rank3]:E1204 13:23:11.894000 408099 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1476355Z [rank3]:E1204 13:23:11.894000 408099 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1476480Z [rank3]:E1204 13:23:11.894000 408099 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1476760Z [rank3]:E1204 13:23:11.894000 408099 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1476911Z [rank3]:E1204 13:23:11.894000 408099 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1477193Z [rank3]:E1204 13:23:11.894000 408099 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1477343Z [rank3]:E1204 13:23:11.894000 408099 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1477631Z [rank3]:E1204 13:23:11.894000 408099 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1477789Z [rank3]:E1204 13:23:11.894000 408099 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1478068Z [rank3]:E1204 13:23:11.894000 408099 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1478220Z [rank3]:E1204 13:23:11.894000 408099 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1478703Z [rank3]:E1204 13:23:11.894000 408099 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 2250244096 and is now 17416847360.
2025-12-04T13:38:32.1478836Z [rank3]:E1204 13:23:11.894000 408099 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1479035Z [rank3]:E1204 13:23:11.894000 408099 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1479402Z [rank3]:E1204 13:23:11.894000 408099 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.1479519Z [rank3]:E1204 13:23:11.894000 408099 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1479773Z [rank3]:E1204 13:23:11.894000 408099 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1479944Z [rank3]:E1204 13:23:11.894000 408099 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.1479984Z dist init r=3, world=4
2025-12-04T13:38:32.1480125Z [rank0]:E1204 13:23:11.913000 408096 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1480298Z [rank0]:E1204 13:23:11.913000 408096 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1480589Z [rank0]:E1204 13:23:11.913000 408096 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1480749Z [rank0]:E1204 13:23:11.913000 408096 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1481034Z [rank0]:E1204 13:23:11.913000 408096 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1481162Z [rank0]:E1204 13:23:11.913000 408096 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1481440Z [rank0]:E1204 13:23:11.913000 408096 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1481597Z [rank0]:E1204 13:23:11.913000 408096 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1481891Z [rank0]:E1204 13:23:11.913000 408096 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1482057Z [rank0]:E1204 13:23:11.913000 408096 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1482339Z [rank0]:E1204 13:23:11.913000 408096 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1482477Z [rank0]:E1204 13:23:11.913000 408096 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1482760Z [rank0]:E1204 13:23:11.913000 408096 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1482910Z [rank0]:E1204 13:23:11.913000 408096 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1483401Z [rank0]:E1204 13:23:11.913000 408096 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 2453667840 and is now 17620271104.
2025-12-04T13:38:32.1483530Z [rank0]:E1204 13:23:11.913000 408096 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1483729Z [rank0]:E1204 13:23:11.913000 408096 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1484099Z [rank0]:E1204 13:23:11.913000 408096 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.1484213Z [rank0]:E1204 13:23:11.913000 408096 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1484430Z [rank0]:E1204 13:23:11.913000 408096 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1484595Z [rank0]:E1204 13:23:11.913000 408096 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.1484651Z dist init r=0, world=4
2025-12-04T13:38:32.1484789Z [rank2]:E1204 13:23:11.919000 408098 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1484952Z [rank2]:E1204 13:23:11.919000 408098 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1485238Z [rank2]:E1204 13:23:11.919000 408098 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1485398Z [rank2]:E1204 13:23:11.919000 408098 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1485686Z [rank2]:E1204 13:23:11.919000 408098 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1485811Z [rank2]:E1204 13:23:11.919000 408098 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1486105Z [rank2]:E1204 13:23:11.919000 408098 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1486257Z [rank2]:E1204 13:23:11.919000 408098 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1486552Z [rank2]:E1204 13:23:11.919000 408098 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1486700Z [rank2]:E1204 13:23:11.919000 408098 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1486980Z [rank2]:E1204 13:23:11.919000 408098 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1487122Z [rank2]:E1204 13:23:11.919000 408098 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1487401Z [rank2]:E1204 13:23:11.919000 408098 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1487564Z [rank2]:E1204 13:23:11.919000 408098 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1488045Z [rank2]:E1204 13:23:11.919000 408098 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 2300575744 and is now 17467179008.
2025-12-04T13:38:32.1488162Z [rank2]:E1204 13:23:11.919000 408098 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1488358Z [rank2]:E1204 13:23:11.919000 408098 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1488726Z [rank2]:E1204 13:23:11.919000 408098 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.1488842Z [rank2]:E1204 13:23:11.919000 408098 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1489063Z [rank2]:E1204 13:23:11.919000 408098 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1489233Z [rank2]:E1204 13:23:11.919000 408098 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.1489273Z dist init r=2, world=4
2025-12-04T13:38:32.1489651Z [rank1]:[W1204 13:23:12.045593740 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1489980Z [rank3]:[W1204 13:23:12.154005590 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1490310Z [rank2]:[W1204 13:23:12.278942926 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1490652Z [rank0]:[W1204 13:23:12.290628555 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1490706Z FAILED [23.1388s] [100%]
2025-12-04T13:38:32.1490709Z 
2025-12-04T13:38:32.1490771Z =================================== FAILURES ===================================
2025-12-04T13:38:32.1490874Z _ TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda _
2025-12-04T13:38:32.1490924Z Traceback (most recent call last):
2025-12-04T13:38:32.1491090Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.1491138Z     self._join_processes(fn)
2025-12-04T13:38:32.1491314Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.1491372Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.1491551Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.1491600Z     raise RuntimeError(error)
2025-12-04T13:38:32.1491693Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T13:38:32.1491741Z Traceback (most recent call last):
2025-12-04T13:38:32.1491904Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1491950Z     getattr(self, test_name)()
2025-12-04T13:38:32.1492113Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1492149Z     fn()
2025-12-04T13:38:32.1492304Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1492347Z     method(*args, **kwargs)
2025-12-04T13:38:32.1492503Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1492545Z     method(*args, **kwargs)
2025-12-04T13:38:32.1492704Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1492742Z     with policy():
2025-12-04T13:38:32.1492901Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1492942Z     raise RuntimeError(msg)
2025-12-04T13:38:32.1493324Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 2317352960 and is now 17483956224.
2025-12-04T13:38:32.1493327Z 
2025-12-04T13:38:32.1493404Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1493647Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.1493651Z 
2025-12-04T13:38:32.1493742Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1493744Z 
2025-12-04T13:38:32.1493746Z 
2025-12-04T13:38:32.1493821Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.1493912Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.1494147Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-16768b6b4b00c7ee.xml -
2025-12-04T13:38:32.1494210Z =========================== short test summary info ============================
2025-12-04T13:38:32.1494484Z FAILED [23.1388s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_shard_grad_op_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T13:38:32.1494544Z Traceback (most recent call last):
2025-12-04T13:38:32.1494711Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1494758Z     getattr(self, test_name)()
2025-12-04T13:38:32.1494918Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1494957Z     fn()
2025-12-04T13:38:32.1495109Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1495154Z     method(*args, **kwargs)
2025-12-04T13:38:32.1495307Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1495351Z     method(*args, **kwargs)
2025-12-04T13:38:32.1495504Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1495556Z     with policy():
2025-12-04T13:38:32.1495715Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1495757Z     raise RuntimeError(msg)
2025-12-04T13:38:32.1496114Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 2317352960 and is now 17483956224.
2025-12-04T13:38:32.1496116Z 
2025-12-04T13:38:32.1496193Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1496433Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.1496435Z 
2025-12-04T13:38:32.1496522Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1496592Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.1496656Z ====================== 1 failed, 32 deselected in 23.28s =======================
2025-12-04T13:38:32.1496698Z Got exit code 1
2025-12-04T13:38:32.1496883Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.1497024Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T13:38:32.1497212Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-f3169d33739d2fe6.xml
2025-12-04T13:38:32.1497275Z ============================= test session starts ==============================
2025-12-04T13:38:32.1497392Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.1497436Z cachedir: .pytest_cache
2025-12-04T13:38:32.1497599Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.1497646Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.1497690Z configfile: pytest.ini
2025-12-04T13:38:32.1497854Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.1497932Z collecting ... collected 60 items / 17 deselected / 43 selected
2025-12-04T13:38:32.1497986Z stepcurrent: skipping 17 already run items.
2025-12-04T13:38:32.1498033Z Running 16 items in this shard
2025-12-04T13:38:32.1498035Z 
2025-12-04T13:38:32.1498386Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda I1204 13:23:28.993000 409437 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 409506
2025-12-04T13:38:32.1498558Z I1204 13:23:28.994000 409437 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 409507
2025-12-04T13:38:32.1498710Z I1204 13:23:28.994000 409437 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 409508
2025-12-04T13:38:32.1498865Z I1204 13:23:28.995000 409437 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 409509
2025-12-04T13:38:32.1499449Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1499489Z   _warn_cpu_init()
2025-12-04T13:38:32.1500044Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1500109Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1500692Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1500734Z   _warn_cpu_init()
2025-12-04T13:38:32.1501224Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1501303Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1501873Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1501915Z   _warn_cpu_init()
2025-12-04T13:38:32.1502409Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1502468Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1503052Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1503104Z   _warn_cpu_init()
2025-12-04T13:38:32.1503399Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1503483Z   shared = FSDP(shared, group, **fsdp_kwargs)  # type: ignore[assignment]
2025-12-04T13:38:32.1503978Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1504040Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1504330Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1504428Z   shared = FSDP(shared, group, **fsdp_kwargs)  # type: ignore[assignment]
2025-12-04T13:38:32.1504724Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1504806Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.1505094Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1505177Z   shared = FSDP(shared, group, **fsdp_kwargs)  # type: ignore[assignment]
2025-12-04T13:38:32.1505676Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1505745Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1506035Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1506113Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.1506609Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1506669Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1506961Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1507037Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.1507335Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1507429Z   shared = FSDP(shared, group, **fsdp_kwargs)  # type: ignore[assignment]
2025-12-04T13:38:32.1507919Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1507982Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1508269Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1508348Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.1509672Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.1509816Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.1510051Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1510099Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1511390Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.1511520Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.1511750Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1511793Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1513064Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.1513203Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.1513431Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1513487Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1514748Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.1514873Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.1515103Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1515145Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1515383Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1515425Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1515650Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1515692Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1515919Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1515960Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1516185Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1516226Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1516523Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.1516579Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1516726Z [rank3]:E1204 13:24:01.162000 409509 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1516903Z [rank3]:E1204 13:24:01.162000 409509 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1517199Z [rank3]:E1204 13:24:01.162000 409509 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1517359Z [rank3]:E1204 13:24:01.162000 409509 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1517648Z [rank3]:E1204 13:24:01.162000 409509 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1517776Z [rank3]:E1204 13:24:01.162000 409509 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1518054Z [rank3]:E1204 13:24:01.162000 409509 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1518218Z [rank3]:E1204 13:24:01.162000 409509 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1518500Z [rank3]:E1204 13:24:01.162000 409509 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1518649Z [rank3]:E1204 13:24:01.162000 409509 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1518930Z [rank3]:E1204 13:24:01.162000 409509 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1519070Z [rank3]:E1204 13:24:01.162000 409509 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1519354Z [rank3]:E1204 13:24:01.162000 409509 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1519516Z [rank3]:E1204 13:24:01.162000 409509 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1520065Z [rank3]:E1204 13:24:01.162000 409509 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 3. CUDA driver allocated memory was 2250244096 and is now 17427333120.
2025-12-04T13:38:32.1520186Z [rank3]:E1204 13:24:01.162000 409509 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1520382Z [rank3]:E1204 13:24:01.162000 409509 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1520778Z [rank3]:E1204 13:24:01.162000 409509 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda
2025-12-04T13:38:32.1520892Z [rank3]:E1204 13:24:01.162000 409509 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1521121Z [rank3]:E1204 13:24:01.162000 409509 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1521304Z [rank3]:E1204 13:24:01.162000 409509 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.1521345Z dist init r=3, world=4
2025-12-04T13:38:32.1521487Z [rank2]:E1204 13:24:01.207000 409508 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1521647Z [rank2]:E1204 13:24:01.207000 409508 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1521942Z [rank2]:E1204 13:24:01.207000 409508 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1522097Z [rank2]:E1204 13:24:01.207000 409508 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1522385Z [rank2]:E1204 13:24:01.207000 409508 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1522525Z [rank2]:E1204 13:24:01.207000 409508 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1522805Z [rank2]:E1204 13:24:01.207000 409508 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1522956Z [rank2]:E1204 13:24:01.207000 409508 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1523233Z [rank2]:E1204 13:24:01.207000 409508 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1523385Z [rank2]:E1204 13:24:01.207000 409508 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1523662Z [rank2]:E1204 13:24:01.207000 409508 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1523815Z [rank2]:E1204 13:24:01.207000 409508 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1524094Z [rank2]:E1204 13:24:01.207000 409508 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1524247Z [rank2]:E1204 13:24:01.207000 409508 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1524760Z [rank2]:E1204 13:24:01.207000 409508 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 2. CUDA driver allocated memory was 2300575744 and is now 17477664768.
2025-12-04T13:38:32.1524879Z [rank2]:E1204 13:24:01.207000 409508 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1525077Z [rank2]:E1204 13:24:01.207000 409508 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1525477Z [rank2]:E1204 13:24:01.207000 409508 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda
2025-12-04T13:38:32.1525605Z [rank2]:E1204 13:24:01.207000 409508 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1525819Z [rank2]:E1204 13:24:01.207000 409508 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1525987Z [rank2]:E1204 13:24:01.207000 409508 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.1526033Z dist init r=2, world=4
2025-12-04T13:38:32.1526171Z [rank1]:E1204 13:24:01.225000 409507 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1526334Z [rank1]:E1204 13:24:01.225000 409507 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1526628Z [rank1]:E1204 13:24:01.225000 409507 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1526803Z [rank1]:E1204 13:24:01.225000 409507 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1527088Z [rank1]:E1204 13:24:01.225000 409507 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1527215Z [rank1]:E1204 13:24:01.225000 409507 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1527491Z [rank1]:E1204 13:24:01.225000 409507 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1527641Z [rank1]:E1204 13:24:01.225000 409507 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1527922Z [rank1]:E1204 13:24:01.225000 409507 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1528069Z [rank1]:E1204 13:24:01.225000 409507 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1528361Z [rank1]:E1204 13:24:01.225000 409507 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1528499Z [rank1]:E1204 13:24:01.225000 409507 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1528784Z [rank1]:E1204 13:24:01.225000 409507 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1528935Z [rank1]:E1204 13:24:01.225000 409507 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1529445Z [rank1]:E1204 13:24:01.225000 409507 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 1. CUDA driver allocated memory was 2317352960 and is now 17494441984.
2025-12-04T13:38:32.1529611Z [rank1]:E1204 13:24:01.225000 409507 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1529808Z [rank1]:E1204 13:24:01.225000 409507 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1530210Z [rank1]:E1204 13:24:01.225000 409507 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda
2025-12-04T13:38:32.1530324Z [rank1]:E1204 13:24:01.225000 409507 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1530539Z [rank1]:E1204 13:24:01.225000 409507 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1530707Z [rank1]:E1204 13:24:01.225000 409507 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.1530749Z dist init r=1, world=4
2025-12-04T13:38:32.1530890Z [rank0]:E1204 13:24:01.249000 409506 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1531064Z [rank0]:E1204 13:24:01.249000 409506 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1531356Z [rank0]:E1204 13:24:01.249000 409506 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1531510Z [rank0]:E1204 13:24:01.249000 409506 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1531797Z [rank0]:E1204 13:24:01.249000 409506 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1531922Z [rank0]:E1204 13:24:01.249000 409506 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1532203Z [rank0]:E1204 13:24:01.249000 409506 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1532350Z [rank0]:E1204 13:24:01.249000 409506 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1532641Z [rank0]:E1204 13:24:01.249000 409506 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1532793Z [rank0]:E1204 13:24:01.249000 409506 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1533069Z [rank0]:E1204 13:24:01.249000 409506 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1533209Z [rank0]:E1204 13:24:01.249000 409506 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1533488Z [rank0]:E1204 13:24:01.249000 409506 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1533641Z [rank0]:E1204 13:24:01.249000 409506 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1534164Z [rank0]:E1204 13:24:01.249000 409506 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 0. CUDA driver allocated memory was 2453667840 and is now 17630756864.
2025-12-04T13:38:32.1534290Z [rank0]:E1204 13:24:01.249000 409506 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1534488Z [rank0]:E1204 13:24:01.249000 409506 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1534875Z [rank0]:E1204 13:24:01.249000 409506 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda
2025-12-04T13:38:32.1534992Z [rank0]:E1204 13:24:01.249000 409506 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1535203Z [rank0]:E1204 13:24:01.249000 409506 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1535382Z [rank0]:E1204 13:24:01.249000 409506 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.1535420Z dist init r=0, world=4
2025-12-04T13:38:32.1535763Z [rank3]:[W1204 13:24:01.328484486 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1536102Z [rank2]:[W1204 13:24:01.445492076 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1536429Z [rank1]:[W1204 13:24:01.545642154 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1536758Z [rank0]:[W1204 13:24:01.611323837 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1536809Z FAILED [46.4742s] [  6%]
2025-12-04T13:38:32.1536812Z 
2025-12-04T13:38:32.1536873Z =================================== FAILURES ===================================
2025-12-04T13:38:32.1537004Z _ TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda _
2025-12-04T13:38:32.1537053Z Traceback (most recent call last):
2025-12-04T13:38:32.1537220Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.1537265Z     self._join_processes(fn)
2025-12-04T13:38:32.1537443Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.1537498Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.1537681Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.1537726Z     raise RuntimeError(error)
2025-12-04T13:38:32.1537810Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.1537856Z Traceback (most recent call last):
2025-12-04T13:38:32.1538033Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1538076Z     getattr(self, test_name)()
2025-12-04T13:38:32.1538240Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1538288Z     fn()
2025-12-04T13:38:32.1538445Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1538487Z     method(*args, **kwargs)
2025-12-04T13:38:32.1538641Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1538683Z     method(*args, **kwargs)
2025-12-04T13:38:32.1538838Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1538876Z     with policy():
2025-12-04T13:38:32.1539033Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1539075Z     raise RuntimeError(msg)
2025-12-04T13:38:32.1539464Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 3. CUDA driver allocated memory was 2250244096 and is now 17427333120.
2025-12-04T13:38:32.1539478Z 
2025-12-04T13:38:32.1539557Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1539853Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda
2025-12-04T13:38:32.1539855Z 
2025-12-04T13:38:32.1539948Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1539951Z 
2025-12-04T13:38:32.1539953Z 
2025-12-04T13:38:32.1540029Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.1540121Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.1540356Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-f3169d33739d2fe6.xml -
2025-12-04T13:38:32.1540421Z =========================== short test summary info ============================
2025-12-04T13:38:32.1540711Z FAILED [46.4742s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.1540761Z Traceback (most recent call last):
2025-12-04T13:38:32.1540934Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1540976Z     getattr(self, test_name)()
2025-12-04T13:38:32.1541140Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1541178Z     fn()
2025-12-04T13:38:32.1541335Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1541376Z     method(*args, **kwargs)
2025-12-04T13:38:32.1541532Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1541573Z     method(*args, **kwargs)
2025-12-04T13:38:32.1541730Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1541767Z     with policy():
2025-12-04T13:38:32.1541922Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1541962Z     raise RuntimeError(msg)
2025-12-04T13:38:32.1542363Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 3. CUDA driver allocated memory was 2250244096 and is now 17427333120.
2025-12-04T13:38:32.1542389Z 
2025-12-04T13:38:32.1542464Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1542729Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda
2025-12-04T13:38:32.1542732Z 
2025-12-04T13:38:32.1542821Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1542884Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.1542949Z ====================== 1 failed, 17 deselected in 46.64s =======================
2025-12-04T13:38:32.1542986Z Got exit code 1
2025-12-04T13:38:32.1543028Z Retrying single test...
2025-12-04T13:38:32.1543218Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3701308a73cd137b.xml
2025-12-04T13:38:32.1543291Z ============================= test session starts ==============================
2025-12-04T13:38:32.1543404Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.1543447Z cachedir: .pytest_cache
2025-12-04T13:38:32.1543607Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.1543654Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.1543694Z configfile: pytest.ini
2025-12-04T13:38:32.1543860Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.1543934Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.1544193Z stepcurrent: skipping 17 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda
2025-12-04T13:38:32.1544238Z Running 1 items in this shard
2025-12-04T13:38:32.1544240Z 
2025-12-04T13:38:32.1544583Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda I1204 13:24:18.101000 410703 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 410772
2025-12-04T13:38:32.1544752Z I1204 13:24:18.102000 410703 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 410773
2025-12-04T13:38:32.1544903Z I1204 13:24:18.102000 410703 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 410774
2025-12-04T13:38:32.1545055Z I1204 13:24:18.103000 410703 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 410775
2025-12-04T13:38:32.1545638Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1545678Z   _warn_cpu_init()
2025-12-04T13:38:32.1546179Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1546243Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1546833Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1546871Z   _warn_cpu_init()
2025-12-04T13:38:32.1547365Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1547426Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1548016Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1548054Z   _warn_cpu_init()
2025-12-04T13:38:32.1548541Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1548602Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1549179Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1549220Z   _warn_cpu_init()
2025-12-04T13:38:32.1549518Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1549636Z   shared = FSDP(shared, group, **fsdp_kwargs)  # type: ignore[assignment]
2025-12-04T13:38:32.1549928Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1550008Z   shared = FSDP(shared, group, **fsdp_kwargs)  # type: ignore[assignment]
2025-12-04T13:38:32.1550506Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1550565Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1550868Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1550962Z   shared = FSDP(shared, group, **fsdp_kwargs)  # type: ignore[assignment]
2025-12-04T13:38:32.1551451Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1551510Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1551802Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1551883Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.1552169Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1552260Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.1552548Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1552621Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.1553110Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1553171Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1553455Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1553546Z   shared = FSDP(shared, group, **fsdp_kwargs)  # type: ignore[assignment]
2025-12-04T13:38:32.1554035Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1554095Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1554385Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1554459Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.1555741Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.1555880Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.1557142Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.1557276Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.1557506Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1557551Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1557775Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1557820Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1559085Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.1559207Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.1559436Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1559478Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1560788Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.1560924Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.1561149Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1561192Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1561416Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1561471Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1561692Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1561732Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1561955Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1561995Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1562214Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1562255Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1562549Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.1562589Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1562734Z [rank3]:E1204 13:24:50.291000 410775 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1562910Z [rank3]:E1204 13:24:50.291000 410775 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1563204Z [rank3]:E1204 13:24:50.291000 410775 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1563360Z [rank3]:E1204 13:24:50.291000 410775 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1563647Z [rank3]:E1204 13:24:50.291000 410775 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1563772Z [rank3]:E1204 13:24:50.291000 410775 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1564051Z [rank3]:E1204 13:24:50.291000 410775 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1564213Z [rank3]:E1204 13:24:50.291000 410775 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1564489Z [rank3]:E1204 13:24:50.291000 410775 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1564651Z [rank3]:E1204 13:24:50.291000 410775 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1564927Z [rank3]:E1204 13:24:50.291000 410775 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1565066Z [rank3]:E1204 13:24:50.291000 410775 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1565347Z [rank3]:E1204 13:24:50.291000 410775 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1565495Z [rank3]:E1204 13:24:50.291000 410775 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1566020Z [rank3]:E1204 13:24:50.291000 410775 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 3. CUDA driver allocated memory was 2250244096 and is now 17427333120.
2025-12-04T13:38:32.1566137Z [rank3]:E1204 13:24:50.291000 410775 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1566336Z [rank3]:E1204 13:24:50.291000 410775 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1566731Z [rank3]:E1204 13:24:50.291000 410775 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda
2025-12-04T13:38:32.1566846Z [rank3]:E1204 13:24:50.291000 410775 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1567062Z [rank3]:E1204 13:24:50.291000 410775 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1567237Z [rank3]:E1204 13:24:50.291000 410775 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.1567278Z dist init r=3, world=4
2025-12-04T13:38:32.1567416Z [rank1]:E1204 13:24:50.296000 410773 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1567579Z [rank1]:E1204 13:24:50.296000 410773 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1567867Z [rank1]:E1204 13:24:50.296000 410773 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1568024Z [rank1]:E1204 13:24:50.296000 410773 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1568312Z [rank1]:E1204 13:24:50.296000 410773 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1568435Z [rank1]:E1204 13:24:50.296000 410773 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1568726Z [rank1]:E1204 13:24:50.296000 410773 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1568885Z [rank1]:E1204 13:24:50.296000 410773 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1569161Z [rank1]:E1204 13:24:50.296000 410773 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1569308Z [rank1]:E1204 13:24:50.296000 410773 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1569687Z [rank1]:E1204 13:24:50.296000 410773 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1569827Z [rank1]:E1204 13:24:50.296000 410773 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1570106Z [rank1]:E1204 13:24:50.296000 410773 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1570269Z [rank1]:E1204 13:24:50.296000 410773 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1570783Z [rank1]:E1204 13:24:50.296000 410773 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 1. CUDA driver allocated memory was 2317352960 and is now 17494441984.
2025-12-04T13:38:32.1570899Z [rank1]:E1204 13:24:50.296000 410773 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1571098Z [rank1]:E1204 13:24:50.296000 410773 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1571490Z [rank1]:E1204 13:24:50.296000 410773 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda
2025-12-04T13:38:32.1571619Z [rank1]:E1204 13:24:50.296000 410773 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1571833Z [rank1]:E1204 13:24:50.296000 410773 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1571998Z [rank1]:E1204 13:24:50.296000 410773 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.1572038Z dist init r=1, world=4
2025-12-04T13:38:32.1572177Z [rank2]:E1204 13:24:50.343000 410774 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1572336Z [rank2]:E1204 13:24:50.343000 410774 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1572625Z [rank2]:E1204 13:24:50.343000 410774 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1572779Z [rank2]:E1204 13:24:50.343000 410774 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1573078Z [rank2]:E1204 13:24:50.343000 410774 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1573216Z [rank2]:E1204 13:24:50.343000 410774 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1573494Z [rank2]:E1204 13:24:50.343000 410774 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1573643Z [rank2]:E1204 13:24:50.343000 410774 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1573917Z [rank2]:E1204 13:24:50.343000 410774 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1574066Z [rank2]:E1204 13:24:50.343000 410774 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1574344Z [rank2]:E1204 13:24:50.343000 410774 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1574491Z [rank2]:E1204 13:24:50.343000 410774 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1574771Z [rank2]:E1204 13:24:50.343000 410774 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1574918Z [rank2]:E1204 13:24:50.343000 410774 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1575429Z [rank2]:E1204 13:24:50.343000 410774 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 2. CUDA driver allocated memory was 2300575744 and is now 17477664768.
2025-12-04T13:38:32.1575545Z [rank2]:E1204 13:24:50.343000 410774 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1575762Z [rank2]:E1204 13:24:50.343000 410774 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1576153Z [rank2]:E1204 13:24:50.343000 410774 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda
2025-12-04T13:38:32.1576265Z [rank2]:E1204 13:24:50.343000 410774 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1576480Z [rank2]:E1204 13:24:50.343000 410774 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1576643Z [rank2]:E1204 13:24:50.343000 410774 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.1576684Z dist init r=2, world=4
2025-12-04T13:38:32.1576821Z [rank0]:E1204 13:24:50.348000 410772 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1576982Z [rank0]:E1204 13:24:50.348000 410772 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1577278Z [rank0]:E1204 13:24:50.348000 410772 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1577444Z [rank0]:E1204 13:24:50.348000 410772 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1577733Z [rank0]:E1204 13:24:50.348000 410772 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1577858Z [rank0]:E1204 13:24:50.348000 410772 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1578135Z [rank0]:E1204 13:24:50.348000 410772 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1578283Z [rank0]:E1204 13:24:50.348000 410772 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1578560Z [rank0]:E1204 13:24:50.348000 410772 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1578718Z [rank0]:E1204 13:24:50.348000 410772 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1578997Z [rank0]:E1204 13:24:50.348000 410772 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1579135Z [rank0]:E1204 13:24:50.348000 410772 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1579413Z [rank0]:E1204 13:24:50.348000 410772 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1579563Z [rank0]:E1204 13:24:50.348000 410772 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1580120Z [rank0]:E1204 13:24:50.348000 410772 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 0. CUDA driver allocated memory was 2453667840 and is now 17630756864.
2025-12-04T13:38:32.1580239Z [rank0]:E1204 13:24:50.348000 410772 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1580434Z [rank0]:E1204 13:24:50.348000 410772 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1580822Z [rank0]:E1204 13:24:50.348000 410772 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda
2025-12-04T13:38:32.1580939Z [rank0]:E1204 13:24:50.348000 410772 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1581150Z [rank0]:E1204 13:24:50.348000 410772 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1581314Z [rank0]:E1204 13:24:50.348000 410772 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.1581352Z dist init r=0, world=4
2025-12-04T13:38:32.1581701Z [rank3]:[W1204 13:24:50.447180058 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1582042Z [rank1]:[W1204 13:24:50.466136064 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1582370Z [rank0]:[W1204 13:24:50.598576057 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1582701Z [rank2]:[W1204 13:24:50.619712354 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1582741Z FAILED [46.4716s] [100%]
2025-12-04T13:38:32.1582744Z 
2025-12-04T13:38:32.1582804Z =================================== FAILURES ===================================
2025-12-04T13:38:32.1582943Z _ TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda _
2025-12-04T13:38:32.1582992Z Traceback (most recent call last):
2025-12-04T13:38:32.1583155Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.1583201Z     self._join_processes(fn)
2025-12-04T13:38:32.1583375Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.1583431Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.1583611Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.1583656Z     raise RuntimeError(error)
2025-12-04T13:38:32.1583736Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.1583784Z Traceback (most recent call last):
2025-12-04T13:38:32.1583946Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1583990Z     getattr(self, test_name)()
2025-12-04T13:38:32.1584161Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1584196Z     fn()
2025-12-04T13:38:32.1584352Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1584392Z     method(*args, **kwargs)
2025-12-04T13:38:32.1584547Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1584587Z     method(*args, **kwargs)
2025-12-04T13:38:32.1584742Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1584780Z     with policy():
2025-12-04T13:38:32.1584936Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1584977Z     raise RuntimeError(msg)
2025-12-04T13:38:32.1585366Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 3. CUDA driver allocated memory was 2250244096 and is now 17427333120.
2025-12-04T13:38:32.1585368Z 
2025-12-04T13:38:32.1585454Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1585719Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda
2025-12-04T13:38:32.1585732Z 
2025-12-04T13:38:32.1585820Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1585824Z 
2025-12-04T13:38:32.1585826Z 
2025-12-04T13:38:32.1585901Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.1585991Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.1586225Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3701308a73cd137b.xml -
2025-12-04T13:38:32.1586288Z =========================== short test summary info ============================
2025-12-04T13:38:32.1586564Z FAILED [46.4716s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.1586612Z Traceback (most recent call last):
2025-12-04T13:38:32.1586788Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1586832Z     getattr(self, test_name)()
2025-12-04T13:38:32.1586994Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1587031Z     fn()
2025-12-04T13:38:32.1587186Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1587228Z     method(*args, **kwargs)
2025-12-04T13:38:32.1587381Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1587426Z     method(*args, **kwargs)
2025-12-04T13:38:32.1587578Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1587618Z     with policy():
2025-12-04T13:38:32.1587774Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1587814Z     raise RuntimeError(msg)
2025-12-04T13:38:32.1588212Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 3. CUDA driver allocated memory was 2250244096 and is now 17427333120.
2025-12-04T13:38:32.1588214Z 
2025-12-04T13:38:32.1588288Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1588550Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda
2025-12-04T13:38:32.1588553Z 
2025-12-04T13:38:32.1588640Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1588706Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.1588768Z ====================== 1 failed, 32 deselected in 46.63s =======================
2025-12-04T13:38:32.1588807Z Got exit code 1
2025-12-04T13:38:32.1588847Z Retrying single test...
2025-12-04T13:38:32.1589039Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2e31e92306f47555.xml
2025-12-04T13:38:32.1589097Z ============================= test session starts ==============================
2025-12-04T13:38:32.1589213Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.1589265Z cachedir: .pytest_cache
2025-12-04T13:38:32.1589426Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.1589485Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.1589526Z configfile: pytest.ini
2025-12-04T13:38:32.1589726Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.1589801Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.1590059Z stepcurrent: skipping 17 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda
2025-12-04T13:38:32.1590103Z Running 1 items in this shard
2025-12-04T13:38:32.1590105Z 
2025-12-04T13:38:32.1590444Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda I1204 13:25:07.172000 411969 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 412038
2025-12-04T13:38:32.1590601Z I1204 13:25:07.173000 411969 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 412039
2025-12-04T13:38:32.1590784Z I1204 13:25:07.174000 411969 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 412040
2025-12-04T13:38:32.1590933Z I1204 13:25:07.174000 411969 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 412041
2025-12-04T13:38:32.1591514Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1591554Z   _warn_cpu_init()
2025-12-04T13:38:32.1592045Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1592109Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1592692Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1592733Z   _warn_cpu_init()
2025-12-04T13:38:32.1593223Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1593284Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1593870Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1593920Z   _warn_cpu_init()
2025-12-04T13:38:32.1594413Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1594474Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1595049Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1595099Z   _warn_cpu_init()
2025-12-04T13:38:32.1595389Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1595473Z   shared = FSDP(shared, group, **fsdp_kwargs)  # type: ignore[assignment]
2025-12-04T13:38:32.1595966Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1596024Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1596317Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1596398Z   shared = FSDP(shared, group, **fsdp_kwargs)  # type: ignore[assignment]
2025-12-04T13:38:32.1596919Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1596977Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1597264Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1597346Z   shared = FSDP(shared, group, **fsdp_kwargs)  # type: ignore[assignment]
2025-12-04T13:38:32.1597635Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1597714Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.1598000Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1598077Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.1598375Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1598462Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.1598954Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1599014Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1599302Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1599381Z   shared = FSDP(shared, group, **fsdp_kwargs)  # type: ignore[assignment]
2025-12-04T13:38:32.1599920Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1599994Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1600282Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1600356Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.1601647Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.1601776Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.1602007Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1602056Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1603325Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.1603465Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.1604730Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.1604862Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.1605093Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1605135Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1605362Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1605404Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1606677Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.1606800Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.1607026Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1607069Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1607292Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1607336Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1607566Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1607610Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1607849Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1607892Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1608113Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1608155Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1608452Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.1608492Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1608639Z [rank1]:E1204 13:25:39.390000 412039 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1608802Z [rank1]:E1204 13:25:39.390000 412039 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1609106Z [rank1]:E1204 13:25:39.390000 412039 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1609261Z [rank1]:E1204 13:25:39.390000 412039 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1609549Z [rank1]:E1204 13:25:39.390000 412039 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1609708Z [rank1]:E1204 13:25:39.390000 412039 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1609988Z [rank1]:E1204 13:25:39.390000 412039 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1610142Z [rank1]:E1204 13:25:39.390000 412039 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1610434Z [rank1]:E1204 13:25:39.390000 412039 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1610584Z [rank1]:E1204 13:25:39.390000 412039 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1610860Z [rank1]:E1204 13:25:39.390000 412039 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1611002Z [rank1]:E1204 13:25:39.390000 412039 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1616225Z [rank1]:E1204 13:25:39.390000 412039 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1616383Z [rank1]:E1204 13:25:39.390000 412039 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1616928Z [rank1]:E1204 13:25:39.390000 412039 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 1. CUDA driver allocated memory was 2317352960 and is now 17494441984.
2025-12-04T13:38:32.1617063Z [rank1]:E1204 13:25:39.390000 412039 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1617262Z [rank1]:E1204 13:25:39.390000 412039 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1617658Z [rank1]:E1204 13:25:39.390000 412039 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda
2025-12-04T13:38:32.1617776Z [rank1]:E1204 13:25:39.390000 412039 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1617990Z [rank1]:E1204 13:25:39.390000 412039 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1618160Z [rank1]:E1204 13:25:39.390000 412039 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.1618215Z dist init r=1, world=4
2025-12-04T13:38:32.1618358Z [rank2]:E1204 13:25:39.392000 412040 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1618519Z [rank2]:E1204 13:25:39.392000 412040 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1618815Z [rank2]:E1204 13:25:39.392000 412040 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1618969Z [rank2]:E1204 13:25:39.392000 412040 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1619255Z [rank2]:E1204 13:25:39.392000 412040 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1619385Z [rank2]:E1204 13:25:39.392000 412040 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1619715Z [rank2]:E1204 13:25:39.392000 412040 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1619869Z [rank2]:E1204 13:25:39.392000 412040 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1620146Z [rank2]:E1204 13:25:39.392000 412040 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1620294Z [rank2]:E1204 13:25:39.392000 412040 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1620573Z [rank2]:E1204 13:25:39.392000 412040 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1620712Z [rank2]:E1204 13:25:39.392000 412040 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1620993Z [rank2]:E1204 13:25:39.392000 412040 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1621154Z [rank2]:E1204 13:25:39.392000 412040 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1621668Z [rank2]:E1204 13:25:39.392000 412040 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 2. CUDA driver allocated memory was 2300575744 and is now 17477664768.
2025-12-04T13:38:32.1621797Z [rank2]:E1204 13:25:39.392000 412040 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1621996Z [rank2]:E1204 13:25:39.392000 412040 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1622382Z [rank2]:E1204 13:25:39.392000 412040 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda
2025-12-04T13:38:32.1622498Z [rank2]:E1204 13:25:39.392000 412040 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1622729Z [rank2]:E1204 13:25:39.392000 412040 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1622893Z [rank2]:E1204 13:25:39.392000 412040 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.1622940Z dist init r=2, world=4
2025-12-04T13:38:32.1623077Z [rank0]:E1204 13:25:39.424000 412038 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1623241Z [rank0]:E1204 13:25:39.424000 412038 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1623529Z [rank0]:E1204 13:25:39.424000 412038 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1623688Z [rank0]:E1204 13:25:39.424000 412038 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1623973Z [rank0]:E1204 13:25:39.424000 412038 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1624111Z [rank0]:E1204 13:25:39.424000 412038 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1624392Z [rank0]:E1204 13:25:39.424000 412038 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1624538Z [rank0]:E1204 13:25:39.424000 412038 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1624819Z [rank0]:E1204 13:25:39.424000 412038 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1624966Z [rank0]:E1204 13:25:39.424000 412038 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1625245Z [rank0]:E1204 13:25:39.424000 412038 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1625392Z [rank0]:E1204 13:25:39.424000 412038 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1625674Z [rank0]:E1204 13:25:39.424000 412038 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1625837Z [rank0]:E1204 13:25:39.424000 412038 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1626344Z [rank0]:E1204 13:25:39.424000 412038 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 0. CUDA driver allocated memory was 2453667840 and is now 17630756864.
2025-12-04T13:38:32.1626460Z [rank0]:E1204 13:25:39.424000 412038 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1626656Z [rank0]:E1204 13:25:39.424000 412038 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1627045Z [rank0]:E1204 13:25:39.424000 412038 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda
2025-12-04T13:38:32.1627173Z [rank0]:E1204 13:25:39.424000 412038 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1627386Z [rank0]:E1204 13:25:39.424000 412038 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1627553Z [rank0]:E1204 13:25:39.424000 412038 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.1627592Z dist init r=0, world=4
2025-12-04T13:38:32.1627733Z [rank3]:E1204 13:25:39.436000 412041 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1627895Z [rank3]:E1204 13:25:39.436000 412041 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1628183Z [rank3]:E1204 13:25:39.436000 412041 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1628347Z [rank3]:E1204 13:25:39.436000 412041 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1628635Z [rank3]:E1204 13:25:39.436000 412041 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1628762Z [rank3]:E1204 13:25:39.436000 412041 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1629042Z [rank3]:E1204 13:25:39.436000 412041 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1629192Z [rank3]:E1204 13:25:39.436000 412041 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1629471Z [rank3]:E1204 13:25:39.436000 412041 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1629653Z [rank3]:E1204 13:25:39.436000 412041 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1629965Z [rank3]:E1204 13:25:39.436000 412041 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1630122Z [rank3]:E1204 13:25:39.436000 412041 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1630403Z [rank3]:E1204 13:25:39.436000 412041 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1630554Z [rank3]:E1204 13:25:39.436000 412041 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1631063Z [rank3]:E1204 13:25:39.436000 412041 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 3. CUDA driver allocated memory was 2250244096 and is now 17427333120.
2025-12-04T13:38:32.1631191Z [rank3]:E1204 13:25:39.436000 412041 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1631389Z [rank3]:E1204 13:25:39.436000 412041 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1631774Z [rank3]:E1204 13:25:39.436000 412041 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda
2025-12-04T13:38:32.1631889Z [rank3]:E1204 13:25:39.436000 412041 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1632103Z [rank3]:E1204 13:25:39.436000 412041 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1632267Z [rank3]:E1204 13:25:39.436000 412041 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.1632310Z dist init r=3, world=4
2025-12-04T13:38:32.1632660Z [rank2]:[W1204 13:25:39.556824788 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1632992Z [rank1]:[W1204 13:25:39.605396743 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1633320Z [rank3]:[W1204 13:25:39.709245835 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1633650Z [rank0]:[W1204 13:25:39.747508723 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1633696Z FAILED [46.4710s] [100%]
2025-12-04T13:38:32.1633699Z 
2025-12-04T13:38:32.1633757Z =================================== FAILURES ===================================
2025-12-04T13:38:32.1633889Z _ TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda _
2025-12-04T13:38:32.1633936Z Traceback (most recent call last):
2025-12-04T13:38:32.1634115Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.1634160Z     self._join_processes(fn)
2025-12-04T13:38:32.1634347Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.1634403Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.1634586Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.1634630Z     raise RuntimeError(error)
2025-12-04T13:38:32.1634714Z RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T13:38:32.1634761Z Traceback (most recent call last):
2025-12-04T13:38:32.1634926Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1634970Z     getattr(self, test_name)()
2025-12-04T13:38:32.1635130Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1635167Z     fn()
2025-12-04T13:38:32.1635332Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1635373Z     method(*args, **kwargs)
2025-12-04T13:38:32.1635528Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1635568Z     method(*args, **kwargs)
2025-12-04T13:38:32.1635723Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1635764Z     with policy():
2025-12-04T13:38:32.1635917Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1635961Z     raise RuntimeError(msg)
2025-12-04T13:38:32.1636347Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 2. CUDA driver allocated memory was 2300575744 and is now 17477664768.
2025-12-04T13:38:32.1636352Z 
2025-12-04T13:38:32.1636430Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1636703Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda
2025-12-04T13:38:32.1636706Z 
2025-12-04T13:38:32.1636798Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1636800Z 
2025-12-04T13:38:32.1636802Z 
2025-12-04T13:38:32.1636882Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.1636973Z Process 2 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.1637212Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2e31e92306f47555.xml -
2025-12-04T13:38:32.1637275Z =========================== short test summary info ============================
2025-12-04T13:38:32.1637557Z FAILED [46.4710s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda - RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T13:38:32.1637603Z Traceback (most recent call last):
2025-12-04T13:38:32.1637773Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1637815Z     getattr(self, test_name)()
2025-12-04T13:38:32.1637990Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1638025Z     fn()
2025-12-04T13:38:32.1638180Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1638232Z     method(*args, **kwargs)
2025-12-04T13:38:32.1638386Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1638426Z     method(*args, **kwargs)
2025-12-04T13:38:32.1638580Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1638618Z     with policy():
2025-12-04T13:38:32.1638776Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1638816Z     raise RuntimeError(msg)
2025-12-04T13:38:32.1639210Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 215552 on device 2. CUDA driver allocated memory was 2300575744 and is now 17477664768.
2025-12-04T13:38:32.1639226Z 
2025-12-04T13:38:32.1639307Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1639606Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda
2025-12-04T13:38:32.1639608Z 
2025-12-04T13:38:32.1639700Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1639764Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.1639831Z ====================== 1 failed, 32 deselected in 46.62s =======================
2025-12-04T13:38:32.1639869Z Got exit code 1
2025-12-04T13:38:32.1640082Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda
2025-12-04T13:38:32.1640213Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T13:38:32.1640404Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9797a90b2134b2b2.xml
2025-12-04T13:38:32.1640462Z ============================= test session starts ==============================
2025-12-04T13:38:32.1640595Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.1640637Z cachedir: .pytest_cache
2025-12-04T13:38:32.1640800Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.1640849Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.1640891Z configfile: pytest.ini
2025-12-04T13:38:32.1641058Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.1641134Z collecting ... collected 60 items / 18 deselected / 42 selected
2025-12-04T13:38:32.1641189Z stepcurrent: skipping 18 already run items.
2025-12-04T13:38:32.1641232Z Running 15 items in this shard
2025-12-04T13:38:32.1641234Z 
2025-12-04T13:38:32.1641572Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda I1204 13:25:56.344000 413235 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 413304
2025-12-04T13:38:32.1641727Z I1204 13:25:56.344000 413235 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 413305
2025-12-04T13:38:32.1641897Z I1204 13:25:56.345000 413235 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 413306
2025-12-04T13:38:32.1642047Z I1204 13:25:56.346000 413235 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 413307
2025-12-04T13:38:32.1642650Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1642694Z   _warn_cpu_init()
2025-12-04T13:38:32.1642996Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T13:38:32.1643036Z   _init_core_state(
2025-12-04T13:38:32.1643531Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1643611Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1644191Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1644231Z   _warn_cpu_init()
2025-12-04T13:38:32.1644529Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T13:38:32.1644567Z   _init_core_state(
2025-12-04T13:38:32.1645080Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1645140Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1645713Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1645754Z   _warn_cpu_init()
2025-12-04T13:38:32.1646046Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T13:38:32.1646086Z   _init_core_state(
2025-12-04T13:38:32.1646588Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1646650Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1647231Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1647271Z   _warn_cpu_init()
2025-12-04T13:38:32.1647759Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1647819Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1648318Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1648376Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1648673Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T13:38:32.1648711Z   _init_core_state(
2025-12-04T13:38:32.1649208Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1649270Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1649818Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1649879Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1651160Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.1651287Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.1652558Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.1652684Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.1653956Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.1654076Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.1655336Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.1655460Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.1655690Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1655736Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1655974Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1656018Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1656245Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1656299Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1656525Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1656566Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1656790Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1656830Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1657053Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1657093Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1657316Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1657367Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1657589Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1657628Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1657924Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.1657964Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1658112Z [rank1]:E1204 13:26:28.721000 413305 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1658278Z [rank1]:E1204 13:26:28.721000 413305 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1658575Z [rank1]:E1204 13:26:28.721000 413305 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1658743Z [rank1]:E1204 13:26:28.721000 413305 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1659030Z [rank1]:E1204 13:26:28.721000 413305 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1659158Z [rank1]:E1204 13:26:28.721000 413305 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1659435Z [rank1]:E1204 13:26:28.721000 413305 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1659629Z [rank1]:E1204 13:26:28.721000 413305 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1659911Z [rank1]:E1204 13:26:28.721000 413305 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1660060Z [rank1]:E1204 13:26:28.721000 413305 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1660352Z [rank1]:E1204 13:26:28.721000 413305 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1660503Z [rank1]:E1204 13:26:28.721000 413305 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1660788Z [rank1]:E1204 13:26:28.721000 413305 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1660939Z [rank1]:E1204 13:26:28.721000 413305 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1661452Z [rank1]:E1204 13:26:28.721000 413305 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 1. CUDA driver allocated memory was 2317352960 and is now 17502830592.
2025-12-04T13:38:32.1661570Z [rank1]:E1204 13:26:28.721000 413305 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1661788Z [rank1]:E1204 13:26:28.721000 413305 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1662177Z [rank1]:E1204 13:26:28.721000 413305 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda
2025-12-04T13:38:32.1662290Z [rank1]:E1204 13:26:28.721000 413305 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1662505Z [rank1]:E1204 13:26:28.721000 413305 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1662670Z [rank1]:E1204 13:26:28.721000 413305 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.1662714Z dist init r=1, world=4
2025-12-04T13:38:32.1662851Z [rank2]:E1204 13:26:28.731000 413306 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1663013Z [rank2]:E1204 13:26:28.731000 413306 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1663318Z [rank2]:E1204 13:26:28.731000 413306 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1663475Z [rank2]:E1204 13:26:28.731000 413306 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1663764Z [rank2]:E1204 13:26:28.731000 413306 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1663889Z [rank2]:E1204 13:26:28.731000 413306 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1664169Z [rank2]:E1204 13:26:28.731000 413306 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1664316Z [rank2]:E1204 13:26:28.731000 413306 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1664603Z [rank2]:E1204 13:26:28.731000 413306 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1664755Z [rank2]:E1204 13:26:28.731000 413306 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1665043Z [rank2]:E1204 13:26:28.731000 413306 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1665182Z [rank2]:E1204 13:26:28.731000 413306 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1665462Z [rank2]:E1204 13:26:28.731000 413306 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1665614Z [rank2]:E1204 13:26:28.731000 413306 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1666124Z [rank2]:E1204 13:26:28.731000 413306 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 2300575744 and is now 17486053376.
2025-12-04T13:38:32.1666253Z [rank2]:E1204 13:26:28.731000 413306 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1666451Z [rank2]:E1204 13:26:28.731000 413306 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1666839Z [rank2]:E1204 13:26:28.731000 413306 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda
2025-12-04T13:38:32.1666955Z [rank2]:E1204 13:26:28.731000 413306 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1667167Z [rank2]:E1204 13:26:28.731000 413306 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1667336Z [rank2]:E1204 13:26:28.731000 413306 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.1667385Z dist init r=2, world=4
2025-12-04T13:38:32.1667526Z [rank0]:E1204 13:26:28.744000 413304 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1667686Z [rank0]:E1204 13:26:28.744000 413304 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1667978Z [rank0]:E1204 13:26:28.744000 413304 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1668136Z [rank0]:E1204 13:26:28.744000 413304 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1668424Z [rank0]:E1204 13:26:28.744000 413304 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1668551Z [rank0]:E1204 13:26:28.744000 413304 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1668837Z [rank0]:E1204 13:26:28.744000 413304 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1668988Z [rank0]:E1204 13:26:28.744000 413304 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1669275Z [rank0]:E1204 13:26:28.744000 413304 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1669423Z [rank0]:E1204 13:26:28.744000 413304 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1669747Z [rank0]:E1204 13:26:28.744000 413304 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1669884Z [rank0]:E1204 13:26:28.744000 413304 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1670163Z [rank0]:E1204 13:26:28.744000 413304 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1670327Z [rank0]:E1204 13:26:28.744000 413304 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1670838Z [rank0]:E1204 13:26:28.744000 413304 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 2453667840 and is now 17639145472.
2025-12-04T13:38:32.1670953Z [rank0]:E1204 13:26:28.744000 413304 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1671148Z [rank0]:E1204 13:26:28.744000 413304 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1671531Z [rank0]:E1204 13:26:28.744000 413304 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda
2025-12-04T13:38:32.1671644Z [rank0]:E1204 13:26:28.744000 413304 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1671870Z [rank0]:E1204 13:26:28.744000 413304 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1672034Z [rank0]:E1204 13:26:28.744000 413304 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.1672078Z dist init r=0, world=4
2025-12-04T13:38:32.1672216Z [rank3]:E1204 13:26:28.748000 413307 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1672379Z [rank3]:E1204 13:26:28.748000 413307 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1672668Z [rank3]:E1204 13:26:28.748000 413307 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1672822Z [rank3]:E1204 13:26:28.748000 413307 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1673125Z [rank3]:E1204 13:26:28.748000 413307 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1673249Z [rank3]:E1204 13:26:28.748000 413307 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1673541Z [rank3]:E1204 13:26:28.748000 413307 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1673689Z [rank3]:E1204 13:26:28.748000 413307 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1673967Z [rank3]:E1204 13:26:28.748000 413307 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1674115Z [rank3]:E1204 13:26:28.748000 413307 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1674391Z [rank3]:E1204 13:26:28.748000 413307 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1674540Z [rank3]:E1204 13:26:28.748000 413307 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1674819Z [rank3]:E1204 13:26:28.748000 413307 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1674970Z [rank3]:E1204 13:26:28.748000 413307 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1675480Z [rank3]:E1204 13:26:28.748000 413307 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 3. CUDA driver allocated memory was 2250244096 and is now 17435721728.
2025-12-04T13:38:32.1675597Z [rank3]:E1204 13:26:28.748000 413307 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1675795Z [rank3]:E1204 13:26:28.748000 413307 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1676189Z [rank3]:E1204 13:26:28.748000 413307 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda
2025-12-04T13:38:32.1676304Z [rank3]:E1204 13:26:28.748000 413307 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1676515Z [rank3]:E1204 13:26:28.748000 413307 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1676681Z [rank3]:E1204 13:26:28.748000 413307 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.1676720Z dist init r=3, world=4
2025-12-04T13:38:32.1677059Z [rank1]:[W1204 13:26:28.883919614 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1677388Z [rank2]:[W1204 13:26:28.925505089 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1677726Z [rank0]:[W1204 13:26:29.004387514 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1678074Z [rank3]:[W1204 13:26:29.019036206 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1678115Z FAILED [46.7694s] [  6%]
2025-12-04T13:38:32.1678117Z 
2025-12-04T13:38:32.1678177Z =================================== FAILURES ===================================
2025-12-04T13:38:32.1678300Z _ TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda _
2025-12-04T13:38:32.1678349Z Traceback (most recent call last):
2025-12-04T13:38:32.1678515Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.1678561Z     self._join_processes(fn)
2025-12-04T13:38:32.1678735Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.1678803Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.1678984Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.1679028Z     raise RuntimeError(error)
2025-12-04T13:38:32.1679112Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T13:38:32.1679157Z Traceback (most recent call last):
2025-12-04T13:38:32.1679321Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1679363Z     getattr(self, test_name)()
2025-12-04T13:38:32.1679526Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1679562Z     fn()
2025-12-04T13:38:32.1679755Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1679798Z     method(*args, **kwargs)
2025-12-04T13:38:32.1679953Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1679993Z     method(*args, **kwargs)
2025-12-04T13:38:32.1680164Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1680203Z     with policy():
2025-12-04T13:38:32.1680358Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1680401Z     raise RuntimeError(msg)
2025-12-04T13:38:32.1680787Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 1. CUDA driver allocated memory was 2317352960 and is now 17502830592.
2025-12-04T13:38:32.1680791Z 
2025-12-04T13:38:32.1680868Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1681128Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda
2025-12-04T13:38:32.1681132Z 
2025-12-04T13:38:32.1681223Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1681225Z 
2025-12-04T13:38:32.1681285Z Process 2 exited with error code 10 and exception:
2025-12-04T13:38:32.1681332Z Traceback (most recent call last):
2025-12-04T13:38:32.1681510Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1681553Z     getattr(self, test_name)()
2025-12-04T13:38:32.1681730Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1681765Z     fn()
2025-12-04T13:38:32.1681919Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1681958Z     method(*args, **kwargs)
2025-12-04T13:38:32.1682111Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1682150Z     method(*args, **kwargs)
2025-12-04T13:38:32.1682303Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1682340Z     with policy():
2025-12-04T13:38:32.1682496Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1682537Z     raise RuntimeError(msg)
2025-12-04T13:38:32.1682921Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 2300575744 and is now 17486053376.
2025-12-04T13:38:32.1682937Z 
2025-12-04T13:38:32.1683011Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1683270Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda
2025-12-04T13:38:32.1683272Z 
2025-12-04T13:38:32.1683362Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1683365Z 
2025-12-04T13:38:32.1683366Z 
2025-12-04T13:38:32.1683441Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.1683535Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.1683770Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9797a90b2134b2b2.xml -
2025-12-04T13:38:32.1683832Z =========================== short test summary info ============================
2025-12-04T13:38:32.1684117Z FAILED [46.7694s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T13:38:32.1684165Z Traceback (most recent call last):
2025-12-04T13:38:32.1684329Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1684373Z     getattr(self, test_name)()
2025-12-04T13:38:32.1684533Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1684571Z     fn()
2025-12-04T13:38:32.1684723Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1684765Z     method(*args, **kwargs)
2025-12-04T13:38:32.1684915Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1684957Z     method(*args, **kwargs)
2025-12-04T13:38:32.1685110Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1685146Z     with policy():
2025-12-04T13:38:32.1685309Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1685350Z     raise RuntimeError(msg)
2025-12-04T13:38:32.1685729Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 1. CUDA driver allocated memory was 2317352960 and is now 17502830592.
2025-12-04T13:38:32.1685745Z 
2025-12-04T13:38:32.1685818Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1686074Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda
2025-12-04T13:38:32.1686076Z 
2025-12-04T13:38:32.1686161Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1686164Z 
2025-12-04T13:38:32.1686225Z Process 2 exited with error code 10 and exception:
2025-12-04T13:38:32.1686269Z Traceback (most recent call last):
2025-12-04T13:38:32.1686433Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1686488Z     getattr(self, test_name)()
2025-12-04T13:38:32.1686646Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1686682Z     fn()
2025-12-04T13:38:32.1686832Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1686874Z     method(*args, **kwargs)
2025-12-04T13:38:32.1687026Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1687069Z     method(*args, **kwargs)
2025-12-04T13:38:32.1687221Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1687260Z     with policy():
2025-12-04T13:38:32.1687411Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1687455Z     raise RuntimeError(msg)
2025-12-04T13:38:32.1687831Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 2300575744 and is now 17486053376.
2025-12-04T13:38:32.1687833Z 
2025-12-04T13:38:32.1687919Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1688175Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda
2025-12-04T13:38:32.1688180Z 
2025-12-04T13:38:32.1688267Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1688332Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.1688397Z ====================== 1 failed, 18 deselected in 46.93s =======================
2025-12-04T13:38:32.1688436Z Got exit code 1
2025-12-04T13:38:32.1688475Z Retrying single test...
2025-12-04T13:38:32.1688668Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-abfa87f8a65806d1.xml
2025-12-04T13:38:32.1688726Z ============================= test session starts ==============================
2025-12-04T13:38:32.1688841Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.1688882Z cachedir: .pytest_cache
2025-12-04T13:38:32.1689052Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.1689098Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.1689140Z configfile: pytest.ini
2025-12-04T13:38:32.1689315Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.1689391Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.1689689Z stepcurrent: skipping 18 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda
2025-12-04T13:38:32.1689735Z Running 1 items in this shard
2025-12-04T13:38:32.1689738Z 
2025-12-04T13:38:32.1690068Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda I1204 13:26:45.567000 414501 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 414570
2025-12-04T13:38:32.1690224Z I1204 13:26:45.568000 414501 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 414571
2025-12-04T13:38:32.1690378Z I1204 13:26:45.568000 414501 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 414572
2025-12-04T13:38:32.1690543Z I1204 13:26:45.569000 414501 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 414573
2025-12-04T13:38:32.1691128Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1691167Z   _warn_cpu_init()
2025-12-04T13:38:32.1691469Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T13:38:32.1691508Z   _init_core_state(
2025-12-04T13:38:32.1692017Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1692081Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1692651Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1692692Z   _warn_cpu_init()
2025-12-04T13:38:32.1692986Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T13:38:32.1693026Z   _init_core_state(
2025-12-04T13:38:32.1693537Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1693600Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1694175Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1694228Z   _warn_cpu_init()
2025-12-04T13:38:32.1694528Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T13:38:32.1694565Z   _init_core_state(
2025-12-04T13:38:32.1695056Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1695130Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1695699Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1695739Z   _warn_cpu_init()
2025-12-04T13:38:32.1696229Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1696289Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1696791Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1696849Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1697146Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T13:38:32.1697184Z   _init_core_state(
2025-12-04T13:38:32.1697674Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1697731Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1698228Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1698299Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1699614Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.1699758Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.1701019Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.1701146Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.1702406Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.1702528Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.1703790Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.1703926Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.1704157Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1704202Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1704426Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1704482Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1704703Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1704746Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1704966Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1705008Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1705228Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1705270Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1705490Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1705533Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1705753Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1705793Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1706025Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1706065Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1706358Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.1706399Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1706547Z [rank0]:E1204 13:27:17.952000 414570 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1706712Z [rank0]:E1204 13:27:17.952000 414570 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1707004Z [rank0]:E1204 13:27:17.952000 414570 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1707160Z [rank0]:E1204 13:27:17.952000 414570 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1707458Z [rank0]:E1204 13:27:17.952000 414570 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1707595Z [rank0]:E1204 13:27:17.952000 414570 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1707871Z [rank0]:E1204 13:27:17.952000 414570 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1708022Z [rank0]:E1204 13:27:17.952000 414570 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1708298Z [rank0]:E1204 13:27:17.952000 414570 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1708448Z [rank0]:E1204 13:27:17.952000 414570 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1708722Z [rank0]:E1204 13:27:17.952000 414570 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1708872Z [rank0]:E1204 13:27:17.952000 414570 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1709152Z [rank0]:E1204 13:27:17.952000 414570 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1709299Z [rank0]:E1204 13:27:17.952000 414570 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1709847Z [rank0]:E1204 13:27:17.952000 414570 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 2453667840 and is now 17639145472.
2025-12-04T13:38:32.1709964Z [rank0]:E1204 13:27:17.952000 414570 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1710187Z [rank0]:E1204 13:27:17.952000 414570 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1710571Z [rank0]:E1204 13:27:17.952000 414570 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda
2025-12-04T13:38:32.1710686Z [rank0]:E1204 13:27:17.952000 414570 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1710901Z [rank0]:E1204 13:27:17.952000 414570 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1711066Z [rank0]:E1204 13:27:17.952000 414570 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.1711109Z dist init r=0, world=4
2025-12-04T13:38:32.1711247Z [rank3]:E1204 13:27:17.959000 414573 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1711409Z [rank3]:E1204 13:27:17.959000 414573 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1711711Z [rank3]:E1204 13:27:17.959000 414573 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1711879Z [rank3]:E1204 13:27:17.959000 414573 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1712164Z [rank3]:E1204 13:27:17.959000 414573 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1712290Z [rank3]:E1204 13:27:17.959000 414573 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1712568Z [rank3]:E1204 13:27:17.959000 414573 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1712716Z [rank3]:E1204 13:27:17.959000 414573 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1712995Z [rank3]:E1204 13:27:17.959000 414573 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1713155Z [rank3]:E1204 13:27:17.959000 414573 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1713433Z [rank3]:E1204 13:27:17.959000 414573 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1713572Z [rank3]:E1204 13:27:17.959000 414573 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1713854Z [rank3]:E1204 13:27:17.959000 414573 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1714004Z [rank3]:E1204 13:27:17.959000 414573 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1714517Z [rank3]:E1204 13:27:17.959000 414573 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 3. CUDA driver allocated memory was 2250244096 and is now 17435721728.
2025-12-04T13:38:32.1714633Z [rank3]:E1204 13:27:17.959000 414573 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1714830Z [rank3]:E1204 13:27:17.959000 414573 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1715215Z [rank3]:E1204 13:27:17.959000 414573 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda
2025-12-04T13:38:32.1715331Z [rank3]:E1204 13:27:17.959000 414573 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1715542Z [rank3]:E1204 13:27:17.959000 414573 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1715707Z [rank3]:E1204 13:27:17.959000 414573 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.1715745Z dist init r=3, world=4
2025-12-04T13:38:32.1715896Z [rank2]:E1204 13:27:18.017000 414572 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1716055Z [rank2]:E1204 13:27:18.017000 414572 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1716360Z [rank2]:E1204 13:27:18.017000 414572 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1716513Z [rank2]:E1204 13:27:18.017000 414572 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1716800Z [rank2]:E1204 13:27:18.017000 414572 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1716924Z [rank2]:E1204 13:27:18.017000 414572 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1717202Z [rank2]:E1204 13:27:18.017000 414572 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1717363Z [rank2]:E1204 13:27:18.017000 414572 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1717639Z [rank2]:E1204 13:27:18.017000 414572 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1717787Z [rank2]:E1204 13:27:18.017000 414572 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1718063Z [rank2]:E1204 13:27:18.017000 414572 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1718201Z [rank2]:E1204 13:27:18.017000 414572 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1718480Z [rank2]:E1204 13:27:18.017000 414572 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1718630Z [rank2]:E1204 13:27:18.017000 414572 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1719146Z [rank2]:E1204 13:27:18.017000 414572 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 2300575744 and is now 17486053376.
2025-12-04T13:38:32.1719259Z [rank2]:E1204 13:27:18.017000 414572 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1719457Z [rank2]:E1204 13:27:18.017000 414572 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1719877Z [rank2]:E1204 13:27:18.017000 414572 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda
2025-12-04T13:38:32.1719991Z [rank2]:E1204 13:27:18.017000 414572 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1720220Z [rank2]:E1204 13:27:18.017000 414572 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1720383Z [rank2]:E1204 13:27:18.017000 414572 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.1720439Z dist init r=2, world=4
2025-12-04T13:38:32.1720576Z [rank1]:E1204 13:27:18.038000 414571 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1720737Z [rank1]:E1204 13:27:18.038000 414571 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1721025Z [rank1]:E1204 13:27:18.038000 414571 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1721181Z [rank1]:E1204 13:27:18.038000 414571 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1721464Z [rank1]:E1204 13:27:18.038000 414571 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1721605Z [rank1]:E1204 13:27:18.038000 414571 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1721884Z [rank1]:E1204 13:27:18.038000 414571 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1722032Z [rank1]:E1204 13:27:18.038000 414571 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1722311Z [rank1]:E1204 13:27:18.038000 414571 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1722457Z [rank1]:E1204 13:27:18.038000 414571 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1722735Z [rank1]:E1204 13:27:18.038000 414571 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1722870Z [rank1]:E1204 13:27:18.038000 414571 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1723165Z [rank1]:E1204 13:27:18.038000 414571 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1723315Z [rank1]:E1204 13:27:18.038000 414571 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1723816Z [rank1]:E1204 13:27:18.038000 414571 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 1. CUDA driver allocated memory was 2317352960 and is now 17502830592.
2025-12-04T13:38:32.1723932Z [rank1]:E1204 13:27:18.038000 414571 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1724128Z [rank1]:E1204 13:27:18.038000 414571 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1724520Z [rank1]:E1204 13:27:18.038000 414571 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda
2025-12-04T13:38:32.1724632Z [rank1]:E1204 13:27:18.038000 414571 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1724857Z [rank1]:E1204 13:27:18.038000 414571 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1725023Z [rank1]:E1204 13:27:18.038000 414571 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.1725061Z dist init r=1, world=4
2025-12-04T13:38:32.1725405Z [rank0]:[W1204 13:27:18.111540154 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1725737Z [rank3]:[W1204 13:27:18.129101688 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1726083Z [rank2]:[W1204 13:27:18.279274038 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1726410Z [rank1]:[W1204 13:27:18.327299371 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1726452Z FAILED [46.6698s] [100%]
2025-12-04T13:38:32.1726454Z 
2025-12-04T13:38:32.1726514Z =================================== FAILURES ===================================
2025-12-04T13:38:32.1726636Z _ TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda _
2025-12-04T13:38:32.1726684Z Traceback (most recent call last):
2025-12-04T13:38:32.1726849Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.1726895Z     self._join_processes(fn)
2025-12-04T13:38:32.1727068Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.1727125Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.1727314Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.1727360Z     raise RuntimeError(error)
2025-12-04T13:38:32.1727439Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.1727487Z Traceback (most recent call last):
2025-12-04T13:38:32.1727650Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1727695Z     getattr(self, test_name)()
2025-12-04T13:38:32.1727853Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1727889Z     fn()
2025-12-04T13:38:32.1728040Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1728083Z     method(*args, **kwargs)
2025-12-04T13:38:32.1728236Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1728278Z     method(*args, **kwargs)
2025-12-04T13:38:32.1728429Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1728478Z     with policy():
2025-12-04T13:38:32.1728634Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1728686Z     raise RuntimeError(msg)
2025-12-04T13:38:32.1729067Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 2453667840 and is now 17639145472.
2025-12-04T13:38:32.1729070Z 
2025-12-04T13:38:32.1729146Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1729406Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda
2025-12-04T13:38:32.1729409Z 
2025-12-04T13:38:32.1729496Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1729498Z 
2025-12-04T13:38:32.1729500Z 
2025-12-04T13:38:32.1729616Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.1729719Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.1729958Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-abfa87f8a65806d1.xml -
2025-12-04T13:38:32.1730021Z =========================== short test summary info ============================
2025-12-04T13:38:32.1730293Z FAILED [46.6698s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.1730341Z Traceback (most recent call last):
2025-12-04T13:38:32.1730508Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1730552Z     getattr(self, test_name)()
2025-12-04T13:38:32.1730712Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1730750Z     fn()
2025-12-04T13:38:32.1730901Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1730942Z     method(*args, **kwargs)
2025-12-04T13:38:32.1731092Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1731147Z     method(*args, **kwargs)
2025-12-04T13:38:32.1731298Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1731337Z     with policy():
2025-12-04T13:38:32.1731489Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1731532Z     raise RuntimeError(msg)
2025-12-04T13:38:32.1731908Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 2453667840 and is now 17639145472.
2025-12-04T13:38:32.1731913Z 
2025-12-04T13:38:32.1731987Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1732247Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda
2025-12-04T13:38:32.1732249Z 
2025-12-04T13:38:32.1732336Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1732414Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.1732477Z ====================== 1 failed, 32 deselected in 46.83s =======================
2025-12-04T13:38:32.1732531Z Got exit code 1
2025-12-04T13:38:32.1732572Z Retrying single test...
2025-12-04T13:38:32.1732764Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-7a493fb1ea0337b2.xml
2025-12-04T13:38:32.1732821Z ============================= test session starts ==============================
2025-12-04T13:38:32.1732936Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.1732977Z cachedir: .pytest_cache
2025-12-04T13:38:32.1733138Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.1733183Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.1733225Z configfile: pytest.ini
2025-12-04T13:38:32.1733390Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.1733469Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.1733733Z stepcurrent: skipping 18 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda
2025-12-04T13:38:32.1733777Z Running 1 items in this shard
2025-12-04T13:38:32.1733780Z 
2025-12-04T13:38:32.1734113Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda I1204 13:27:34.820000 415767 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 415836
2025-12-04T13:38:32.1734268Z I1204 13:27:34.821000 415767 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 415837
2025-12-04T13:38:32.1734422Z I1204 13:27:34.821000 415767 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 415838
2025-12-04T13:38:32.1734573Z I1204 13:27:34.822000 415767 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 415839
2025-12-04T13:38:32.1735172Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1735211Z   _warn_cpu_init()
2025-12-04T13:38:32.1735509Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T13:38:32.1735548Z   _init_core_state(
2025-12-04T13:38:32.1736038Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1736103Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1736688Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1736727Z   _warn_cpu_init()
2025-12-04T13:38:32.1737038Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T13:38:32.1737076Z   _init_core_state(
2025-12-04T13:38:32.1737570Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1737630Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1738202Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1738250Z   _warn_cpu_init()
2025-12-04T13:38:32.1738546Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T13:38:32.1738584Z   _init_core_state(
2025-12-04T13:38:32.1739076Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1739137Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1739756Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1739795Z   _warn_cpu_init()
2025-12-04T13:38:32.1740288Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1740347Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1740838Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1740894Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1741202Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1.
2025-12-04T13:38:32.1741239Z   _init_core_state(
2025-12-04T13:38:32.1741731Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1741811Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1742298Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1742356Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1743618Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.1743759Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.1745025Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.1745153Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.1746411Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.1746545Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.1747797Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.1747929Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.1748157Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1748203Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1748427Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1748471Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1748693Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1748736Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1748958Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1749009Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1749233Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1749273Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1749494Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1749535Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1749792Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1749832Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1750055Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1750095Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1750406Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.1750446Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1750592Z [rank2]:E1204 13:28:07.055000 415838 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1750767Z [rank2]:E1204 13:28:07.055000 415838 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1751058Z [rank2]:E1204 13:28:07.055000 415838 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1751215Z [rank2]:E1204 13:28:07.055000 415838 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1751501Z [rank2]:E1204 13:28:07.055000 415838 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1751627Z [rank2]:E1204 13:28:07.055000 415838 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1751904Z [rank2]:E1204 13:28:07.055000 415838 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1752068Z [rank2]:E1204 13:28:07.055000 415838 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1752347Z [rank2]:E1204 13:28:07.055000 415838 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1752497Z [rank2]:E1204 13:28:07.055000 415838 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1752774Z [rank2]:E1204 13:28:07.055000 415838 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1752913Z [rank2]:E1204 13:28:07.055000 415838 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1753191Z [rank2]:E1204 13:28:07.055000 415838 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1753351Z [rank2]:E1204 13:28:07.055000 415838 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1753864Z [rank2]:E1204 13:28:07.055000 415838 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 2300575744 and is now 17486053376.
2025-12-04T13:38:32.1753982Z [rank2]:E1204 13:28:07.055000 415838 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1754176Z [rank2]:E1204 13:28:07.055000 415838 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1754571Z [rank2]:E1204 13:28:07.055000 415838 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda
2025-12-04T13:38:32.1754697Z [rank2]:E1204 13:28:07.055000 415838 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1754909Z [rank2]:E1204 13:28:07.055000 415838 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1755085Z [rank2]:E1204 13:28:07.055000 415838 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.1755126Z dist init r=2, world=4
2025-12-04T13:38:32.1755261Z [rank3]:E1204 13:28:07.060000 415839 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1755424Z [rank3]:E1204 13:28:07.060000 415839 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1755714Z [rank3]:E1204 13:28:07.060000 415839 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1755868Z [rank3]:E1204 13:28:07.060000 415839 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1756155Z [rank3]:E1204 13:28:07.060000 415839 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1756290Z [rank3]:E1204 13:28:07.060000 415839 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1756569Z [rank3]:E1204 13:28:07.060000 415839 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1756716Z [rank3]:E1204 13:28:07.060000 415839 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1756997Z [rank3]:E1204 13:28:07.060000 415839 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1757144Z [rank3]:E1204 13:28:07.060000 415839 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1757420Z [rank3]:E1204 13:28:07.060000 415839 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1757569Z [rank3]:E1204 13:28:07.060000 415839 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1757846Z [rank3]:E1204 13:28:07.060000 415839 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1757997Z [rank3]:E1204 13:28:07.060000 415839 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1758498Z [rank3]:E1204 13:28:07.060000 415839 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 3. CUDA driver allocated memory was 2250244096 and is now 17435721728.
2025-12-04T13:38:32.1758615Z [rank3]:E1204 13:28:07.060000 415839 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1758813Z [rank3]:E1204 13:28:07.060000 415839 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1759220Z [rank3]:E1204 13:28:07.060000 415839 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda
2025-12-04T13:38:32.1759345Z [rank3]:E1204 13:28:07.060000 415839 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1759555Z [rank3]:E1204 13:28:07.060000 415839 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1759764Z [rank3]:E1204 13:28:07.060000 415839 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.1759802Z dist init r=3, world=4
2025-12-04T13:38:32.1759940Z [rank1]:E1204 13:28:07.084000 415837 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1760101Z [rank1]:E1204 13:28:07.084000 415837 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1760389Z [rank1]:E1204 13:28:07.084000 415837 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1760563Z [rank1]:E1204 13:28:07.084000 415837 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1760847Z [rank1]:E1204 13:28:07.084000 415837 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1760972Z [rank1]:E1204 13:28:07.084000 415837 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1761248Z [rank1]:E1204 13:28:07.084000 415837 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1761399Z [rank1]:E1204 13:28:07.084000 415837 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1761677Z [rank1]:E1204 13:28:07.084000 415837 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1761841Z [rank1]:E1204 13:28:07.084000 415837 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1762119Z [rank1]:E1204 13:28:07.084000 415837 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1762255Z [rank1]:E1204 13:28:07.084000 415837 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1762534Z [rank1]:E1204 13:28:07.084000 415837 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1762682Z [rank1]:E1204 13:28:07.084000 415837 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1763187Z [rank1]:E1204 13:28:07.084000 415837 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 1. CUDA driver allocated memory was 2317352960 and is now 17502830592.
2025-12-04T13:38:32.1763312Z [rank1]:E1204 13:28:07.084000 415837 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1763509Z [rank1]:E1204 13:28:07.084000 415837 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1763908Z [rank1]:E1204 13:28:07.084000 415837 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda
2025-12-04T13:38:32.1764020Z [rank1]:E1204 13:28:07.084000 415837 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1764232Z [rank1]:E1204 13:28:07.084000 415837 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1764396Z [rank1]:E1204 13:28:07.084000 415837 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.1764436Z dist init r=1, world=4
2025-12-04T13:38:32.1764571Z [rank0]:E1204 13:28:07.107000 415836 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1764747Z [rank0]:E1204 13:28:07.107000 415836 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1765032Z [rank0]:E1204 13:28:07.107000 415836 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1765187Z [rank0]:E1204 13:28:07.107000 415836 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1765473Z [rank0]:E1204 13:28:07.107000 415836 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1765594Z [rank0]:E1204 13:28:07.107000 415836 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1765873Z [rank0]:E1204 13:28:07.107000 415836 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1766021Z [rank0]:E1204 13:28:07.107000 415836 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1766314Z [rank0]:E1204 13:28:07.107000 415836 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1766462Z [rank0]:E1204 13:28:07.107000 415836 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1766738Z [rank0]:E1204 13:28:07.107000 415836 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1766877Z [rank0]:E1204 13:28:07.107000 415836 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1767154Z [rank0]:E1204 13:28:07.107000 415836 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1767302Z [rank0]:E1204 13:28:07.107000 415836 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1767818Z [rank0]:E1204 13:28:07.107000 415836 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 2453667840 and is now 17639145472.
2025-12-04T13:38:32.1767947Z [rank0]:E1204 13:28:07.107000 415836 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1768141Z [rank0]:E1204 13:28:07.107000 415836 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1768527Z [rank0]:E1204 13:28:07.107000 415836 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda
2025-12-04T13:38:32.1768641Z [rank0]:E1204 13:28:07.107000 415836 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1768851Z [rank0]:E1204 13:28:07.107000 415836 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1769031Z [rank0]:E1204 13:28:07.107000 415836 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.1769069Z dist init r=0, world=4
2025-12-04T13:38:32.1769407Z [rank2]:[W1204 13:28:07.224533174 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1769944Z [rank3]:[W1204 13:28:07.238802770 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1770299Z [rank1]:[W1204 13:28:07.268634057 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1770627Z [rank0]:[W1204 13:28:07.358891388 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1770680Z FAILED [46.5692s] [100%]
2025-12-04T13:38:32.1770682Z 
2025-12-04T13:38:32.1770740Z =================================== FAILURES ===================================
2025-12-04T13:38:32.1770861Z _ TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda _
2025-12-04T13:38:32.1770910Z Traceback (most recent call last):
2025-12-04T13:38:32.1771075Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.1771121Z     self._join_processes(fn)
2025-12-04T13:38:32.1771295Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.1771352Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.1771529Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.1771575Z     raise RuntimeError(error)
2025-12-04T13:38:32.1771654Z RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T13:38:32.1771701Z Traceback (most recent call last):
2025-12-04T13:38:32.1771884Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1771926Z     getattr(self, test_name)()
2025-12-04T13:38:32.1772087Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1772136Z     fn()
2025-12-04T13:38:32.1772291Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1772332Z     method(*args, **kwargs)
2025-12-04T13:38:32.1772487Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1772527Z     method(*args, **kwargs)
2025-12-04T13:38:32.1772680Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1772717Z     with policy():
2025-12-04T13:38:32.1772874Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1772915Z     raise RuntimeError(msg)
2025-12-04T13:38:32.1773297Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 2300575744 and is now 17486053376.
2025-12-04T13:38:32.1773317Z 
2025-12-04T13:38:32.1773392Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1773651Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda
2025-12-04T13:38:32.1773653Z 
2025-12-04T13:38:32.1773742Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1773744Z 
2025-12-04T13:38:32.1773803Z Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.1773851Z Traceback (most recent call last):
2025-12-04T13:38:32.1774014Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1774059Z     getattr(self, test_name)()
2025-12-04T13:38:32.1774219Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1774255Z     fn()
2025-12-04T13:38:32.1774405Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1774446Z     method(*args, **kwargs)
2025-12-04T13:38:32.1774618Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1774659Z     method(*args, **kwargs)
2025-12-04T13:38:32.1774811Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1774849Z     with policy():
2025-12-04T13:38:32.1775002Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1775046Z     raise RuntimeError(msg)
2025-12-04T13:38:32.1775424Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 3. CUDA driver allocated memory was 2250244096 and is now 17435721728.
2025-12-04T13:38:32.1775428Z 
2025-12-04T13:38:32.1775503Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1775765Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda
2025-12-04T13:38:32.1775767Z 
2025-12-04T13:38:32.1775867Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1775870Z 
2025-12-04T13:38:32.1775871Z 
2025-12-04T13:38:32.1775959Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.1776050Z Process 2 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.1776289Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-7a493fb1ea0337b2.xml -
2025-12-04T13:38:32.1776349Z =========================== short test summary info ============================
2025-12-04T13:38:32.1776628Z FAILED [46.5692s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda - RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T13:38:32.1776673Z Traceback (most recent call last):
2025-12-04T13:38:32.1776842Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1776884Z     getattr(self, test_name)()
2025-12-04T13:38:32.1777052Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1777104Z     fn()
2025-12-04T13:38:32.1777254Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1777295Z     method(*args, **kwargs)
2025-12-04T13:38:32.1777449Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1777490Z     method(*args, **kwargs)
2025-12-04T13:38:32.1777641Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1777679Z     with policy():
2025-12-04T13:38:32.1777836Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1777878Z     raise RuntimeError(msg)
2025-12-04T13:38:32.1778258Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 2300575744 and is now 17486053376.
2025-12-04T13:38:32.1778261Z 
2025-12-04T13:38:32.1778337Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1778606Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda
2025-12-04T13:38:32.1778608Z 
2025-12-04T13:38:32.1778696Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1778698Z 
2025-12-04T13:38:32.1778760Z Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.1778803Z Traceback (most recent call last):
2025-12-04T13:38:32.1778968Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1779010Z     getattr(self, test_name)()
2025-12-04T13:38:32.1779170Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1779204Z     fn()
2025-12-04T13:38:32.1779359Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1779397Z     method(*args, **kwargs)
2025-12-04T13:38:32.1779550Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1779630Z     method(*args, **kwargs)
2025-12-04T13:38:32.1779798Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1779834Z     with policy():
2025-12-04T13:38:32.1780003Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1780044Z     raise RuntimeError(msg)
2025-12-04T13:38:32.1780423Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 3. CUDA driver allocated memory was 2250244096 and is now 17435721728.
2025-12-04T13:38:32.1780426Z 
2025-12-04T13:38:32.1780498Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1780754Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda
2025-12-04T13:38:32.1780756Z 
2025-12-04T13:38:32.1780844Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1780909Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.1780991Z ====================== 1 failed, 32 deselected in 46.73s =======================
2025-12-04T13:38:32.1781027Z Got exit code 1
2025-12-04T13:38:32.1781237Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda
2025-12-04T13:38:32.1781366Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T13:38:32.1781556Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-033de2995758149b.xml
2025-12-04T13:38:32.1781614Z ============================= test session starts ==============================
2025-12-04T13:38:32.1781729Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.1781771Z cachedir: .pytest_cache
2025-12-04T13:38:32.1781932Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.1781979Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.1782021Z configfile: pytest.ini
2025-12-04T13:38:32.1782187Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.1782282Z collecting ... collected 60 items / 19 deselected / 41 selected
2025-12-04T13:38:32.1782336Z stepcurrent: skipping 19 already run items.
2025-12-04T13:38:32.1782382Z Running 14 items in this shard
2025-12-04T13:38:32.1782384Z 
2025-12-04T13:38:32.1782732Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda I1204 13:28:23.837000 417033 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 417102
2025-12-04T13:38:32.1782887Z I1204 13:28:23.837000 417033 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 417103
2025-12-04T13:38:32.1783040Z I1204 13:28:23.838000 417033 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 417104
2025-12-04T13:38:32.1783190Z I1204 13:28:23.838000 417033 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 417105
2025-12-04T13:38:32.1783788Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1783836Z   _warn_cpu_init()
2025-12-04T13:38:32.1784145Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1.
2025-12-04T13:38:32.1784186Z   _init_core_state(
2025-12-04T13:38:32.1784679Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1784744Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1785323Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1785377Z   _warn_cpu_init()
2025-12-04T13:38:32.1785680Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1.
2025-12-04T13:38:32.1785719Z   _init_core_state(
2025-12-04T13:38:32.1786210Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1786272Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1786860Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1786897Z   _warn_cpu_init()
2025-12-04T13:38:32.1787200Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1.
2025-12-04T13:38:32.1787236Z   _init_core_state(
2025-12-04T13:38:32.1787726Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1787788Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1788377Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1788426Z   _warn_cpu_init()
2025-12-04T13:38:32.1788911Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1788972Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1789465Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1789522Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1789862Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1.
2025-12-04T13:38:32.1789924Z   _init_core_state(
2025-12-04T13:38:32.1790417Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1790477Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1790959Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1791019Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1792302Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.1792430Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.1793703Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.1793841Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.1795095Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.1795234Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.1796509Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.1796633Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.1796859Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1796906Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1797128Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1797172Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1797393Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1797436Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1797668Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1797710Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1797930Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1797987Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1798209Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1798248Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1798471Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1798510Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1798740Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1798779Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1799073Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.1799129Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1799277Z [rank3]:E1204 13:28:56.096000 417105 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1799440Z [rank3]:E1204 13:28:56.096000 417105 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1799773Z [rank3]:E1204 13:28:56.096000 417105 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1799930Z [rank3]:E1204 13:28:56.096000 417105 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1800220Z [rank3]:E1204 13:28:56.096000 417105 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1800349Z [rank3]:E1204 13:28:56.096000 417105 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1800645Z [rank3]:E1204 13:28:56.096000 417105 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1800797Z [rank3]:E1204 13:28:56.096000 417105 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1801075Z [rank3]:E1204 13:28:56.096000 417105 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1801226Z [rank3]:E1204 13:28:56.096000 417105 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1801505Z [rank3]:E1204 13:28:56.096000 417105 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1801644Z [rank3]:E1204 13:28:56.096000 417105 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1801943Z [rank3]:E1204 13:28:56.096000 417105 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1802091Z [rank3]:E1204 13:28:56.096000 417105 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1802634Z [rank3]:E1204 13:28:56.096000 417105 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 3. CUDA driver allocated memory was 2250244096 and is now 17435721728.
2025-12-04T13:38:32.1802751Z [rank3]:E1204 13:28:56.096000 417105 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1802949Z [rank3]:E1204 13:28:56.096000 417105 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1803347Z [rank3]:E1204 13:28:56.096000 417105 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.1803479Z [rank3]:E1204 13:28:56.096000 417105 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1803692Z [rank3]:E1204 13:28:56.096000 417105 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1803856Z [rank3]:E1204 13:28:56.096000 417105 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.1803898Z dist init r=3, world=4
2025-12-04T13:38:32.1804034Z [rank1]:E1204 13:28:56.105000 417103 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1804195Z [rank1]:E1204 13:28:56.105000 417103 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1804482Z [rank1]:E1204 13:28:56.105000 417103 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1804639Z [rank1]:E1204 13:28:56.105000 417103 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1804942Z [rank1]:E1204 13:28:56.105000 417103 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1805065Z [rank1]:E1204 13:28:56.105000 417103 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1805344Z [rank1]:E1204 13:28:56.105000 417103 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1805493Z [rank1]:E1204 13:28:56.105000 417103 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1805773Z [rank1]:E1204 13:28:56.105000 417103 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1805920Z [rank1]:E1204 13:28:56.105000 417103 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1806213Z [rank1]:E1204 13:28:56.105000 417103 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1806350Z [rank1]:E1204 13:28:56.105000 417103 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1806645Z [rank1]:E1204 13:28:56.105000 417103 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1806795Z [rank1]:E1204 13:28:56.105000 417103 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1807312Z [rank1]:E1204 13:28:56.105000 417103 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 1. CUDA driver allocated memory was 2317352960 and is now 17502830592.
2025-12-04T13:38:32.1807428Z [rank1]:E1204 13:28:56.105000 417103 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1807623Z [rank1]:E1204 13:28:56.105000 417103 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1808032Z [rank1]:E1204 13:28:56.105000 417103 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.1808148Z [rank1]:E1204 13:28:56.105000 417103 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1808357Z [rank1]:E1204 13:28:56.105000 417103 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1808524Z [rank1]:E1204 13:28:56.105000 417103 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.1808563Z dist init r=1, world=4
2025-12-04T13:38:32.1808701Z [rank2]:E1204 13:28:56.173000 417104 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1808861Z [rank2]:E1204 13:28:56.173000 417104 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1809164Z [rank2]:E1204 13:28:56.173000 417104 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1809320Z [rank2]:E1204 13:28:56.173000 417104 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1809839Z [rank2]:E1204 13:28:56.173000 417104 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1809969Z [rank2]:E1204 13:28:56.173000 417104 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1810247Z [rank2]:E1204 13:28:56.173000 417104 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1810398Z [rank2]:E1204 13:28:56.173000 417104 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1810674Z [rank2]:E1204 13:28:56.173000 417104 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1810839Z [rank2]:E1204 13:28:56.173000 417104 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1811115Z [rank2]:E1204 13:28:56.173000 417104 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1811268Z [rank2]:E1204 13:28:56.173000 417104 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1811549Z [rank2]:E1204 13:28:56.173000 417104 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1811697Z [rank2]:E1204 13:28:56.173000 417104 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1812215Z [rank2]:E1204 13:28:56.173000 417104 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 2300575744 and is now 17486053376.
2025-12-04T13:38:32.1812345Z [rank2]:E1204 13:28:56.173000 417104 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1812544Z [rank2]:E1204 13:28:56.173000 417104 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1812940Z [rank2]:E1204 13:28:56.173000 417104 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.1813054Z [rank2]:E1204 13:28:56.173000 417104 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1813267Z [rank2]:E1204 13:28:56.173000 417104 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1813431Z [rank2]:E1204 13:28:56.173000 417104 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.1813473Z dist init r=2, world=4
2025-12-04T13:38:32.1813627Z [rank0]:E1204 13:28:56.174000 417102 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1813790Z [rank0]:E1204 13:28:56.174000 417102 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1814082Z [rank0]:E1204 13:28:56.174000 417102 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1814245Z [rank0]:E1204 13:28:56.174000 417102 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1814536Z [rank0]:E1204 13:28:56.174000 417102 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1814662Z [rank0]:E1204 13:28:56.174000 417102 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1814947Z [rank0]:E1204 13:28:56.174000 417102 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1815109Z [rank0]:E1204 13:28:56.174000 417102 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1815391Z [rank0]:E1204 13:28:56.174000 417102 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1815549Z [rank0]:E1204 13:28:56.174000 417102 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1815828Z [rank0]:E1204 13:28:56.174000 417102 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1815966Z [rank0]:E1204 13:28:56.174000 417102 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1816244Z [rank0]:E1204 13:28:56.174000 417102 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1816395Z [rank0]:E1204 13:28:56.174000 417102 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1816922Z [rank0]:E1204 13:28:56.174000 417102 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 2453667840 and is now 17639145472.
2025-12-04T13:38:32.1817040Z [rank0]:E1204 13:28:56.174000 417102 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1817235Z [rank0]:E1204 13:28:56.174000 417102 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1817629Z [rank0]:E1204 13:28:56.174000 417102 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.1817746Z [rank0]:E1204 13:28:56.174000 417102 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1817970Z [rank0]:E1204 13:28:56.174000 417102 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1818138Z [rank0]:E1204 13:28:56.174000 417102 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.1818177Z dist init r=0, world=4
2025-12-04T13:38:32.1818517Z [rank3]:[W1204 13:28:56.274381314 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1818849Z [rank1]:[W1204 13:28:56.314020855 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1819182Z [rank0]:[W1204 13:28:56.570280825 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1819520Z [rank2]:[W1204 13:28:56.575162453 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1819561Z FAILED [46.4685s] [  7%]
2025-12-04T13:38:32.1819610Z 
2025-12-04T13:38:32.1819675Z =================================== FAILURES ===================================
2025-12-04T13:38:32.1819810Z _ TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda _
2025-12-04T13:38:32.1819860Z Traceback (most recent call last):
2025-12-04T13:38:32.1820024Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.1820072Z     self._join_processes(fn)
2025-12-04T13:38:32.1820247Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.1820306Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.1820485Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.1820532Z     raise RuntimeError(error)
2025-12-04T13:38:32.1820614Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.1820679Z Traceback (most recent call last):
2025-12-04T13:38:32.1820842Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1820887Z     getattr(self, test_name)()
2025-12-04T13:38:32.1821049Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1821086Z     fn()
2025-12-04T13:38:32.1821243Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1821285Z     method(*args, **kwargs)
2025-12-04T13:38:32.1821442Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1821483Z     method(*args, **kwargs)
2025-12-04T13:38:32.1821638Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1821677Z     with policy():
2025-12-04T13:38:32.1821834Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1821876Z     raise RuntimeError(msg)
2025-12-04T13:38:32.1822284Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 3. CUDA driver allocated memory was 2250244096 and is now 17435721728.
2025-12-04T13:38:32.1822287Z 
2025-12-04T13:38:32.1822364Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1822637Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.1822641Z 
2025-12-04T13:38:32.1822731Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1822733Z 
2025-12-04T13:38:32.1822735Z 
2025-12-04T13:38:32.1822810Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.1822902Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.1823136Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-033de2995758149b.xml -
2025-12-04T13:38:32.1823199Z =========================== short test summary info ============================
2025-12-04T13:38:32.1823500Z FAILED [46.4685s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.1823563Z Traceback (most recent call last):
2025-12-04T13:38:32.1823729Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1823775Z     getattr(self, test_name)()
2025-12-04T13:38:32.1823937Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1823976Z     fn()
2025-12-04T13:38:32.1824129Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1824174Z     method(*args, **kwargs)
2025-12-04T13:38:32.1824327Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1824370Z     method(*args, **kwargs)
2025-12-04T13:38:32.1824525Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1824574Z     with policy():
2025-12-04T13:38:32.1824729Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1824771Z     raise RuntimeError(msg)
2025-12-04T13:38:32.1825162Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 3. CUDA driver allocated memory was 2250244096 and is now 17435721728.
2025-12-04T13:38:32.1825164Z 
2025-12-04T13:38:32.1825238Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1825509Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.1825513Z 
2025-12-04T13:38:32.1825600Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1825668Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.1825732Z ====================== 1 failed, 19 deselected in 46.63s =======================
2025-12-04T13:38:32.1825775Z Got exit code 1
2025-12-04T13:38:32.1825816Z Retrying single test...
2025-12-04T13:38:32.1826021Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0cf86fdbf3893144.xml
2025-12-04T13:38:32.1826081Z ============================= test session starts ==============================
2025-12-04T13:38:32.1826199Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.1826244Z cachedir: .pytest_cache
2025-12-04T13:38:32.1826403Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.1826455Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.1826496Z configfile: pytest.ini
2025-12-04T13:38:32.1826663Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.1826737Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.1827003Z stepcurrent: skipping 19 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.1827048Z Running 1 items in this shard
2025-12-04T13:38:32.1827050Z 
2025-12-04T13:38:32.1827400Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda I1204 13:29:12.963000 418299 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 418368
2025-12-04T13:38:32.1827567Z I1204 13:29:12.963000 418299 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 418369
2025-12-04T13:38:32.1827724Z I1204 13:29:12.964000 418299 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 418370
2025-12-04T13:38:32.1827875Z I1204 13:29:12.964000 418299 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 418371
2025-12-04T13:38:32.1828465Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1828507Z   _warn_cpu_init()
2025-12-04T13:38:32.1828814Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1.
2025-12-04T13:38:32.1828866Z   _init_core_state(
2025-12-04T13:38:32.1829360Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1829426Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1830036Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1830076Z   _warn_cpu_init()
2025-12-04T13:38:32.1830395Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1.
2025-12-04T13:38:32.1830434Z   _init_core_state(
2025-12-04T13:38:32.1830930Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1830993Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1831566Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1831607Z   _warn_cpu_init()
2025-12-04T13:38:32.1831920Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1.
2025-12-04T13:38:32.1831963Z   _init_core_state(
2025-12-04T13:38:32.1832467Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1832531Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1833104Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1833143Z   _warn_cpu_init()
2025-12-04T13:38:32.1833640Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1833713Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1834203Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1834264Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1834753Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1834828Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1835128Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1.
2025-12-04T13:38:32.1835169Z   _init_core_state(
2025-12-04T13:38:32.1835653Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1835715Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1836995Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.1837134Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.1838408Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.1838546Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.1839859Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.1839981Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.1841233Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.1841375Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.1841609Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1841669Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1841897Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1841939Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1842166Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1842207Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1842433Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1842473Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1842695Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1842750Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1842971Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1843012Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1843236Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1843278Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1843503Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1843547Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1843841Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.1843886Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1844032Z [rank2]:E1204 13:29:45.487000 418370 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1844211Z [rank2]:E1204 13:29:45.487000 418370 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1844504Z [rank2]:E1204 13:29:45.487000 418370 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1844664Z [rank2]:E1204 13:29:45.487000 418370 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1844951Z [rank2]:E1204 13:29:45.487000 418370 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1845079Z [rank2]:E1204 13:29:45.487000 418370 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1845359Z [rank2]:E1204 13:29:45.487000 418370 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1845508Z [rank2]:E1204 13:29:45.487000 418370 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1845801Z [rank2]:E1204 13:29:45.487000 418370 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1845961Z [rank2]:E1204 13:29:45.487000 418370 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1846242Z [rank2]:E1204 13:29:45.487000 418370 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1846381Z [rank2]:E1204 13:29:45.487000 418370 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1846663Z [rank2]:E1204 13:29:45.487000 418370 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1846815Z [rank2]:E1204 13:29:45.487000 418370 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1847335Z [rank2]:E1204 13:29:45.487000 418370 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 2300575744 and is now 17486053376.
2025-12-04T13:38:32.1847468Z [rank2]:E1204 13:29:45.487000 418370 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1847664Z [rank2]:E1204 13:29:45.487000 418370 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1848066Z [rank2]:E1204 13:29:45.487000 418370 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.1848183Z [rank2]:E1204 13:29:45.487000 418370 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1848399Z [rank2]:E1204 13:29:45.487000 418370 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1848577Z [rank2]:E1204 13:29:45.487000 418370 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.1848617Z dist init r=2, world=4
2025-12-04T13:38:32.1848757Z [rank1]:E1204 13:29:45.492000 418369 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1848917Z [rank1]:E1204 13:29:45.492000 418369 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1849209Z [rank1]:E1204 13:29:45.492000 418369 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1849364Z [rank1]:E1204 13:29:45.492000 418369 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1849706Z [rank1]:E1204 13:29:45.492000 418369 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1849830Z [rank1]:E1204 13:29:45.492000 418369 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1850127Z [rank1]:E1204 13:29:45.492000 418369 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1850291Z [rank1]:E1204 13:29:45.492000 418369 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1850570Z [rank1]:E1204 13:29:45.492000 418369 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1850722Z [rank1]:E1204 13:29:45.492000 418369 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1850998Z [rank1]:E1204 13:29:45.492000 418369 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1851139Z [rank1]:E1204 13:29:45.492000 418369 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1851418Z [rank1]:E1204 13:29:45.492000 418369 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1851581Z [rank1]:E1204 13:29:45.492000 418369 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1852095Z [rank1]:E1204 13:29:45.492000 418369 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 1. CUDA driver allocated memory was 2317352960 and is now 17502830592.
2025-12-04T13:38:32.1852211Z [rank1]:E1204 13:29:45.492000 418369 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1852409Z [rank1]:E1204 13:29:45.492000 418369 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1852804Z [rank1]:E1204 13:29:45.492000 418369 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.1852939Z [rank1]:E1204 13:29:45.492000 418369 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1853151Z [rank1]:E1204 13:29:45.492000 418369 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1853320Z [rank1]:E1204 13:29:45.492000 418369 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.1853461Z [rank0]:E1204 13:29:45.492000 418368 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1853622Z [rank0]:E1204 13:29:45.492000 418368 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1853912Z [rank0]:E1204 13:29:45.492000 418368 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1854066Z [rank0]:E1204 13:29:45.492000 418368 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1854365Z [rank0]:E1204 13:29:45.492000 418368 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1854489Z [rank0]:E1204 13:29:45.492000 418368 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1854780Z [rank0]:E1204 13:29:45.492000 418368 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1854930Z [rank0]:E1204 13:29:45.492000 418368 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1855213Z [rank0]:E1204 13:29:45.492000 418368 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1855363Z [rank0]:E1204 13:29:45.492000 418368 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1855641Z [rank0]:E1204 13:29:45.492000 418368 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1855797Z [rank0]:E1204 13:29:45.492000 418368 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1856079Z [rank0]:E1204 13:29:45.492000 418368 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1856230Z [rank0]:E1204 13:29:45.492000 418368 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1856747Z [rank0]:E1204 13:29:45.492000 418368 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 2453667840 and is now 17639145472.
2025-12-04T13:38:32.1856862Z [rank0]:E1204 13:29:45.492000 418368 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1857059Z [rank0]:E1204 13:29:45.492000 418368 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1857466Z [rank0]:E1204 13:29:45.492000 418368 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.1857584Z [rank0]:E1204 13:29:45.492000 418368 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1857794Z [rank0]:E1204 13:29:45.492000 418368 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1857963Z [rank0]:E1204 13:29:45.492000 418368 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.1858006Z dist init r=1, world=4
2025-12-04T13:38:32.1858045Z dist init r=0, world=4
2025-12-04T13:38:32.1858187Z [rank3]:E1204 13:29:45.569000 418371 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1858348Z [rank3]:E1204 13:29:45.569000 418371 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1858648Z [rank3]:E1204 13:29:45.569000 418371 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1858802Z [rank3]:E1204 13:29:45.569000 418371 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1859103Z [rank3]:E1204 13:29:45.569000 418371 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1859226Z [rank3]:E1204 13:29:45.569000 418371 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1859507Z [rank3]:E1204 13:29:45.569000 418371 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1859697Z [rank3]:E1204 13:29:45.569000 418371 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1859973Z [rank3]:E1204 13:29:45.569000 418371 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1860137Z [rank3]:E1204 13:29:45.569000 418371 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1860418Z [rank3]:E1204 13:29:45.569000 418371 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1860559Z [rank3]:E1204 13:29:45.569000 418371 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1860839Z [rank3]:E1204 13:29:45.569000 418371 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1860990Z [rank3]:E1204 13:29:45.569000 418371 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1861521Z [rank3]:E1204 13:29:45.569000 418371 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 3. CUDA driver allocated memory was 2250244096 and is now 17435721728.
2025-12-04T13:38:32.1861635Z [rank3]:E1204 13:29:45.569000 418371 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1861834Z [rank3]:E1204 13:29:45.569000 418371 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1862224Z [rank3]:E1204 13:29:45.569000 418371 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.1862342Z [rank3]:E1204 13:29:45.569000 418371 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1862552Z [rank3]:E1204 13:29:45.569000 418371 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1862719Z [rank3]:E1204 13:29:45.569000 418371 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.1862761Z dist init r=3, world=4
2025-12-04T13:38:32.1863111Z [rank2]:[W1204 13:29:45.653916674 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1863459Z [rank0]:[W1204 13:29:45.677580620 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1863792Z [rank1]:[W1204 13:29:45.682040213 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1864122Z [rank3]:[W1204 13:29:45.839237476 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1864163Z FAILED [46.8716s] [100%]
2025-12-04T13:38:32.1864170Z 
2025-12-04T13:38:32.1864227Z =================================== FAILURES ===================================
2025-12-04T13:38:32.1864377Z _ TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda _
2025-12-04T13:38:32.1864424Z Traceback (most recent call last):
2025-12-04T13:38:32.1864592Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.1864637Z     self._join_processes(fn)
2025-12-04T13:38:32.1864813Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.1864868Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.1865050Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.1865094Z     raise RuntimeError(error)
2025-12-04T13:38:32.1865180Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.1865227Z Traceback (most recent call last):
2025-12-04T13:38:32.1865392Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1865435Z     getattr(self, test_name)()
2025-12-04T13:38:32.1865597Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1865644Z     fn()
2025-12-04T13:38:32.1865801Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1865843Z     method(*args, **kwargs)
2025-12-04T13:38:32.1865999Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1866041Z     method(*args, **kwargs)
2025-12-04T13:38:32.1866196Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1866236Z     with policy():
2025-12-04T13:38:32.1866393Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1866435Z     raise RuntimeError(msg)
2025-12-04T13:38:32.1866825Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 2453667840 and is now 17639145472.
2025-12-04T13:38:32.1866827Z 
2025-12-04T13:38:32.1866908Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1867190Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.1867203Z 
2025-12-04T13:38:32.1867294Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1867296Z 
2025-12-04T13:38:32.1867298Z 
2025-12-04T13:38:32.1867373Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.1867463Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.1867699Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0cf86fdbf3893144.xml -
2025-12-04T13:38:32.1867763Z =========================== short test summary info ============================
2025-12-04T13:38:32.1868046Z FAILED [46.8716s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.1868093Z Traceback (most recent call last):
2025-12-04T13:38:32.1868274Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1868317Z     getattr(self, test_name)()
2025-12-04T13:38:32.1868481Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1868517Z     fn()
2025-12-04T13:38:32.1868673Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1868715Z     method(*args, **kwargs)
2025-12-04T13:38:32.1868870Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1868911Z     method(*args, **kwargs)
2025-12-04T13:38:32.1869066Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1869105Z     with policy():
2025-12-04T13:38:32.1869261Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1869302Z     raise RuntimeError(msg)
2025-12-04T13:38:32.1869745Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 2453667840 and is now 17639145472.
2025-12-04T13:38:32.1869748Z 
2025-12-04T13:38:32.1869825Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1870094Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.1870098Z 
2025-12-04T13:38:32.1870187Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1870252Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.1870319Z ====================== 1 failed, 32 deselected in 47.03s =======================
2025-12-04T13:38:32.1870357Z Got exit code 1
2025-12-04T13:38:32.1870401Z Retrying single test...
2025-12-04T13:38:32.1870592Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-8217f4cff6fb20ac.xml
2025-12-04T13:38:32.1870653Z ============================= test session starts ==============================
2025-12-04T13:38:32.1870766Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.1870824Z cachedir: .pytest_cache
2025-12-04T13:38:32.1870984Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.1871057Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.1871099Z configfile: pytest.ini
2025-12-04T13:38:32.1871267Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.1871342Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.1871607Z stepcurrent: skipping 19 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.1871651Z Running 1 items in this shard
2025-12-04T13:38:32.1871657Z 
2025-12-04T13:38:32.1871998Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda I1204 13:30:02.345000 419565 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 419634
2025-12-04T13:38:32.1872159Z I1204 13:30:02.345000 419565 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 419635
2025-12-04T13:38:32.1872329Z I1204 13:30:02.346000 419565 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 419636
2025-12-04T13:38:32.1876232Z I1204 13:30:02.347000 419565 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 419637
2025-12-04T13:38:32.1876826Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1876868Z   _warn_cpu_init()
2025-12-04T13:38:32.1877179Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1.
2025-12-04T13:38:32.1877221Z   _init_core_state(
2025-12-04T13:38:32.1877738Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1877807Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1878379Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1878421Z   _warn_cpu_init()
2025-12-04T13:38:32.1878726Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1.
2025-12-04T13:38:32.1878763Z   _init_core_state(
2025-12-04T13:38:32.1879267Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1879340Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1879946Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1879986Z   _warn_cpu_init()
2025-12-04T13:38:32.1880285Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1.
2025-12-04T13:38:32.1880324Z   _init_core_state(
2025-12-04T13:38:32.1880813Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1880894Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1881468Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1881507Z   _warn_cpu_init()
2025-12-04T13:38:32.1882003Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1882062Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1882571Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1882629Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1882929Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1.
2025-12-04T13:38:32.1882970Z   _init_core_state(
2025-12-04T13:38:32.1883458Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1883519Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1884018Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.1884094Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.1885358Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.1885496Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.1886757Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.1886895Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.1888152Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.1888273Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.1889541Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.1889733Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.1889963Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1890024Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1890248Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1890294Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1890518Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1890559Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1890783Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1890824Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1891049Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1891092Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1891314Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1891354Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1891590Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1891632Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1891855Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1891896Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1892192Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.1892237Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1892383Z [rank2]:E1204 13:30:34.582000 419636 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1892549Z [rank2]:E1204 13:30:34.582000 419636 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1892867Z [rank2]:E1204 13:30:34.582000 419636 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1893024Z [rank2]:E1204 13:30:34.582000 419636 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1893323Z [rank2]:E1204 13:30:34.582000 419636 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1893451Z [rank2]:E1204 13:30:34.582000 419636 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1893729Z [rank2]:E1204 13:30:34.582000 419636 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1893880Z [rank2]:E1204 13:30:34.582000 419636 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1894161Z [rank2]:E1204 13:30:34.582000 419636 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1894320Z [rank2]:E1204 13:30:34.582000 419636 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1894600Z [rank2]:E1204 13:30:34.582000 419636 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1894737Z [rank2]:E1204 13:30:34.582000 419636 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1895018Z [rank2]:E1204 13:30:34.582000 419636 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1895165Z [rank2]:E1204 13:30:34.582000 419636 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1895686Z [rank2]:E1204 13:30:34.582000 419636 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 2300575744 and is now 17486053376.
2025-12-04T13:38:32.1895815Z [rank2]:E1204 13:30:34.582000 419636 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1896011Z [rank2]:E1204 13:30:34.582000 419636 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1896412Z [rank2]:E1204 13:30:34.582000 419636 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.1896528Z [rank2]:E1204 13:30:34.582000 419636 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1896743Z [rank2]:E1204 13:30:34.582000 419636 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1896910Z [rank2]:E1204 13:30:34.582000 419636 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.1897050Z [rank1]:E1204 13:30:34.582000 419635 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1897223Z [rank1]:E1204 13:30:34.582000 419635 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1897508Z [rank1]:E1204 13:30:34.582000 419635 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1897676Z [rank1]:E1204 13:30:34.582000 419635 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1897960Z [rank1]:E1204 13:30:34.582000 419635 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1898087Z [rank1]:E1204 13:30:34.582000 419635 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1898364Z [rank1]:E1204 13:30:34.582000 419635 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1898513Z [rank1]:E1204 13:30:34.582000 419635 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1898804Z [rank1]:E1204 13:30:34.582000 419635 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1898951Z [rank1]:E1204 13:30:34.582000 419635 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1899232Z [rank1]:E1204 13:30:34.582000 419635 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1899368Z [rank1]:E1204 13:30:34.582000 419635 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1899698Z [rank1]:E1204 13:30:34.582000 419635 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1899849Z [rank1]:E1204 13:30:34.582000 419635 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1900388Z [rank1]:E1204 13:30:34.582000 419635 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 1. CUDA driver allocated memory was 2317352960 and is now 17502830592.
2025-12-04T13:38:32.1900505Z [rank1]:E1204 13:30:34.582000 419635 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1900699Z [rank1]:E1204 13:30:34.582000 419635 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1901094Z [rank1]:E1204 13:30:34.582000 419635 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.1901207Z [rank1]:E1204 13:30:34.582000 419635 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1901419Z [rank1]:E1204 13:30:34.582000 419635 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1901597Z [rank1]:E1204 13:30:34.582000 419635 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.1901640Z dist init r=2, world=4
2025-12-04T13:38:32.1901678Z dist init r=1, world=4
2025-12-04T13:38:32.1901835Z [rank0]:E1204 13:30:34.640000 419634 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1901999Z [rank0]:E1204 13:30:34.640000 419634 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1902287Z [rank0]:E1204 13:30:34.640000 419634 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1902442Z [rank0]:E1204 13:30:34.640000 419634 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1902729Z [rank0]:E1204 13:30:34.640000 419634 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1902857Z [rank0]:E1204 13:30:34.640000 419634 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1903145Z [rank0]:E1204 13:30:34.640000 419634 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1903298Z [rank0]:E1204 13:30:34.640000 419634 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1903577Z [rank0]:E1204 13:30:34.640000 419634 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1903723Z [rank0]:E1204 13:30:34.640000 419634 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1904001Z [rank0]:E1204 13:30:34.640000 419634 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1904138Z [rank0]:E1204 13:30:34.640000 419634 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1904431Z [rank0]:E1204 13:30:34.640000 419634 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1904579Z [rank0]:E1204 13:30:34.640000 419634 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1905096Z [rank0]:E1204 13:30:34.640000 419634 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 2453667840 and is now 17639145472.
2025-12-04T13:38:32.1905213Z [rank0]:E1204 13:30:34.640000 419634 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1905408Z [rank0]:E1204 13:30:34.640000 419634 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1905802Z [rank0]:E1204 13:30:34.640000 419634 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.1905927Z [rank0]:E1204 13:30:34.640000 419634 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1906140Z [rank0]:E1204 13:30:34.640000 419634 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1906316Z [rank0]:E1204 13:30:34.640000 419634 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.1906356Z dist init r=0, world=4
2025-12-04T13:38:32.1906494Z [rank3]:E1204 13:30:34.666000 419637 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1906655Z [rank3]:E1204 13:30:34.666000 419637 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1906944Z [rank3]:E1204 13:30:34.666000 419637 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1907097Z [rank3]:E1204 13:30:34.666000 419637 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1907396Z [rank3]:E1204 13:30:34.666000 419637 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1907520Z [rank3]:E1204 13:30:34.666000 419637 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1907799Z [rank3]:E1204 13:30:34.666000 419637 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1907946Z [rank3]:E1204 13:30:34.666000 419637 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1908225Z [rank3]:E1204 13:30:34.666000 419637 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1908374Z [rank3]:E1204 13:30:34.666000 419637 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1908665Z [rank3]:E1204 13:30:34.666000 419637 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1908803Z [rank3]:E1204 13:30:34.666000 419637 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1909082Z [rank3]:E1204 13:30:34.666000 419637 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1909230Z [rank3]:E1204 13:30:34.666000 419637 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1909780Z [rank3]:E1204 13:30:34.666000 419637 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 3. CUDA driver allocated memory was 2250244096 and is now 17435721728.
2025-12-04T13:38:32.1909895Z [rank3]:E1204 13:30:34.666000 419637 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1910113Z [rank3]:E1204 13:30:34.666000 419637 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1910505Z [rank3]:E1204 13:30:34.666000 419637 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.1910634Z [rank3]:E1204 13:30:34.666000 419637 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1910847Z [rank3]:E1204 13:30:34.666000 419637 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1911013Z [rank3]:E1204 13:30:34.666000 419637 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.1911052Z dist init r=3, world=4
2025-12-04T13:38:32.1911391Z [rank2]:[W1204 13:30:34.756873150 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1911735Z [rank1]:[W1204 13:30:34.757099897 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1912065Z [rank0]:[W1204 13:30:34.906269204 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1912392Z [rank3]:[W1204 13:30:34.945333733 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1912434Z FAILED [46.5703s] [100%]
2025-12-04T13:38:32.1912437Z 
2025-12-04T13:38:32.1912498Z =================================== FAILURES ===================================
2025-12-04T13:38:32.1912631Z _ TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda _
2025-12-04T13:38:32.1912680Z Traceback (most recent call last):
2025-12-04T13:38:32.1912845Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.1912888Z     self._join_processes(fn)
2025-12-04T13:38:32.1913077Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.1913132Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.1913312Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.1913358Z     raise RuntimeError(error)
2025-12-04T13:38:32.1913441Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T13:38:32.1913487Z Traceback (most recent call last):
2025-12-04T13:38:32.1913651Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1913694Z     getattr(self, test_name)()
2025-12-04T13:38:32.1913854Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1913889Z     fn()
2025-12-04T13:38:32.1914046Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1914087Z     method(*args, **kwargs)
2025-12-04T13:38:32.1914254Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1914295Z     method(*args, **kwargs)
2025-12-04T13:38:32.1914448Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1914497Z     with policy():
2025-12-04T13:38:32.1914654Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1914695Z     raise RuntimeError(msg)
2025-12-04T13:38:32.1915083Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 1. CUDA driver allocated memory was 2317352960 and is now 17502830592.
2025-12-04T13:38:32.1915086Z 
2025-12-04T13:38:32.1915164Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1915431Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.1915444Z 
2025-12-04T13:38:32.1915535Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1915537Z 
2025-12-04T13:38:32.1915538Z 
2025-12-04T13:38:32.1915615Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.1915706Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.1915942Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-8217f4cff6fb20ac.xml -
2025-12-04T13:38:32.1916006Z =========================== short test summary info ============================
2025-12-04T13:38:32.1916288Z FAILED [46.5703s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T13:38:32.1916338Z Traceback (most recent call last):
2025-12-04T13:38:32.1916512Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1916553Z     getattr(self, test_name)()
2025-12-04T13:38:32.1916717Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1916752Z     fn()
2025-12-04T13:38:32.1916917Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1916959Z     method(*args, **kwargs)
2025-12-04T13:38:32.1917114Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1917154Z     method(*args, **kwargs)
2025-12-04T13:38:32.1917308Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1917346Z     with policy():
2025-12-04T13:38:32.1917504Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1917544Z     raise RuntimeError(msg)
2025-12-04T13:38:32.1917935Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 1. CUDA driver allocated memory was 2317352960 and is now 17502830592.
2025-12-04T13:38:32.1917937Z 
2025-12-04T13:38:32.1918011Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1918290Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.1918303Z 
2025-12-04T13:38:32.1918393Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1918457Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.1918521Z ====================== 1 failed, 32 deselected in 46.73s =======================
2025-12-04T13:38:32.1918559Z Got exit code 1
2025-12-04T13:38:32.1918777Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.1918905Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T13:38:32.1919098Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9b67bb4b5b795d1e.xml
2025-12-04T13:38:32.1919156Z ============================= test session starts ==============================
2025-12-04T13:38:32.1919275Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.1919334Z cachedir: .pytest_cache
2025-12-04T13:38:32.1919494Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.1919541Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.1919622Z configfile: pytest.ini
2025-12-04T13:38:32.1919787Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.1919865Z collecting ... collected 60 items / 20 deselected / 40 selected
2025-12-04T13:38:32.1919918Z stepcurrent: skipping 20 already run items.
2025-12-04T13:38:32.1919964Z Running 13 items in this shard
2025-12-04T13:38:32.1919966Z 
2025-12-04T13:38:32.1920284Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_no_shard_cuda I1204 13:30:51.378000 420831 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 420900
2025-12-04T13:38:32.1920440Z I1204 13:30:51.379000 420831 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 420901
2025-12-04T13:38:32.1920596Z I1204 13:30:51.379000 420831 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 420902
2025-12-04T13:38:32.1920760Z I1204 13:30:51.380000 420831 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 420903
2025-12-04T13:38:32.1921056Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:485: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1921104Z   return wrapper_cls(module, **kwargs)
2025-12-04T13:38:32.1921692Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1921732Z   _warn_cpu_init()
2025-12-04T13:38:32.1922016Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:485: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1922065Z   return wrapper_cls(module, **kwargs)
2025-12-04T13:38:32.1922355Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:485: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1922402Z   return wrapper_cls(module, **kwargs)
2025-12-04T13:38:32.1922990Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1923033Z   _warn_cpu_init()
2025-12-04T13:38:32.1923605Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1923665Z   _warn_cpu_init()
2025-12-04T13:38:32.1923956Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:532: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1924044Z   fsdp_model = FSDP(model, auto_wrap_policy=always_wrap_policy, **fsdp_kwargs)
2025-12-04T13:38:32.1924332Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:532: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1924417Z   fsdp_model = FSDP(model, auto_wrap_policy=always_wrap_policy, **fsdp_kwargs)
2025-12-04T13:38:32.1924705Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:532: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1924790Z   fsdp_model = FSDP(model, auto_wrap_policy=always_wrap_policy, **fsdp_kwargs)
2025-12-04T13:38:32.1925069Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:485: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1925114Z   return wrapper_cls(module, **kwargs)
2025-12-04T13:38:32.1925697Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1925738Z   _warn_cpu_init()
2025-12-04T13:38:32.1926028Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:532: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1926116Z   fsdp_model = FSDP(model, auto_wrap_policy=always_wrap_policy, **fsdp_kwargs)
2025-12-04T13:38:32.1926347Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1926392Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1926618Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1926674Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1926896Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1926951Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1927175Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1927215Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1927440Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1927480Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1927702Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1927741Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1927960Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1928013Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1928234Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1928273Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1928568Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.1928609Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1929943Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.1930076Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.1931352Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.1931475Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.1932741Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.1932877Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.1934135Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.1934254Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.1934400Z [rank3]:E1204 13:30:59.014000 420903 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1934574Z [rank3]:E1204 13:30:59.014000 420903 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1934868Z [rank3]:E1204 13:30:59.014000 420903 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1935026Z [rank3]:E1204 13:30:59.014000 420903 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1935311Z [rank3]:E1204 13:30:59.014000 420903 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1935438Z [rank3]:E1204 13:30:59.014000 420903 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1935718Z [rank3]:E1204 13:30:59.014000 420903 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1935870Z [rank3]:E1204 13:30:59.014000 420903 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1936160Z [rank3]:E1204 13:30:59.014000 420903 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1936319Z [rank3]:E1204 13:30:59.014000 420903 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1936597Z [rank3]:E1204 13:30:59.014000 420903 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1936735Z [rank3]:E1204 13:30:59.014000 420903 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1937016Z [rank3]:E1204 13:30:59.014000 420903 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1937164Z [rank3]:E1204 13:30:59.014000 420903 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1937656Z [rank3]:E1204 13:30:59.014000 420903 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 180736 on device 3. CUDA driver allocated memory was 2250244096 and is now 3808428032.
2025-12-04T13:38:32.1937785Z [rank3]:E1204 13:30:59.014000 420903 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1937982Z [rank3]:E1204 13:30:59.014000 420903 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1938358Z [rank3]:E1204 13:30:59.014000 420903 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda
2025-12-04T13:38:32.1938472Z [rank3]:E1204 13:30:59.014000 420903 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1938686Z [rank3]:E1204 13:30:59.014000 420903 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1938861Z [rank3]:E1204 13:30:59.014000 420903 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.1938903Z dist init r=3, world=4
2025-12-04T13:38:32.1939041Z [rank2]:E1204 13:30:59.015000 420902 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1939205Z [rank2]:E1204 13:30:59.015000 420902 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1939493Z [rank2]:E1204 13:30:59.015000 420902 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1939686Z [rank2]:E1204 13:30:59.015000 420902 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1939972Z [rank2]:E1204 13:30:59.015000 420902 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1940095Z [rank2]:E1204 13:30:59.015000 420902 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1940385Z [rank2]:E1204 13:30:59.015000 420902 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1940553Z [rank2]:E1204 13:30:59.015000 420902 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1940833Z [rank2]:E1204 13:30:59.015000 420902 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1940982Z [rank2]:E1204 13:30:59.015000 420902 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1941256Z [rank2]:E1204 13:30:59.015000 420902 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1941394Z [rank2]:E1204 13:30:59.015000 420902 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1941671Z [rank2]:E1204 13:30:59.015000 420902 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1941834Z [rank2]:E1204 13:30:59.015000 420902 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1942320Z [rank2]:E1204 13:30:59.015000 420902 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 184832 on device 2. CUDA driver allocated memory was 2300575744 and is now 3858759680.
2025-12-04T13:38:32.1942437Z [rank2]:E1204 13:30:59.015000 420902 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1942636Z [rank2]:E1204 13:30:59.015000 420902 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1943008Z [rank2]:E1204 13:30:59.015000 420902 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda
2025-12-04T13:38:32.1943138Z [rank2]:E1204 13:30:59.015000 420902 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1943349Z [rank2]:E1204 13:30:59.015000 420902 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1943517Z [rank2]:E1204 13:30:59.015000 420902 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.1943555Z dist init r=2, world=4
2025-12-04T13:38:32.1943693Z [rank0]:E1204 13:30:59.082000 420900 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1943855Z [rank0]:E1204 13:30:59.082000 420900 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1944144Z [rank0]:E1204 13:30:59.082000 420900 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1944302Z [rank0]:E1204 13:30:59.082000 420900 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1944596Z [rank0]:E1204 13:30:59.082000 420900 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1944722Z [rank0]:E1204 13:30:59.082000 420900 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1945009Z [rank0]:E1204 13:30:59.082000 420900 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1945158Z [rank0]:E1204 13:30:59.082000 420900 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1945435Z [rank0]:E1204 13:30:59.082000 420900 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1945585Z [rank0]:E1204 13:30:59.082000 420900 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1945863Z [rank0]:E1204 13:30:59.082000 420900 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1946009Z [rank0]:E1204 13:30:59.082000 420900 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1946290Z [rank0]:E1204 13:30:59.082000 420900 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1946443Z [rank0]:E1204 13:30:59.082000 420900 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1946931Z [rank0]:E1204 13:30:59.082000 420900 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 186880 on device 0. CUDA driver allocated memory was 2453667840 and is now 4011851776.
2025-12-04T13:38:32.1947051Z [rank0]:E1204 13:30:59.082000 420900 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1947246Z [rank0]:E1204 13:30:59.082000 420900 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1947629Z [rank0]:E1204 13:30:59.082000 420900 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda
2025-12-04T13:38:32.1947748Z [rank0]:E1204 13:30:59.082000 420900 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1947959Z [rank0]:E1204 13:30:59.082000 420900 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1948130Z [rank0]:E1204 13:30:59.082000 420900 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.1948169Z dist init r=0, world=4
2025-12-04T13:38:32.1948309Z [rank1]:E1204 13:30:59.096000 420901 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1948470Z [rank1]:E1204 13:30:59.096000 420901 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1948769Z [rank1]:E1204 13:30:59.096000 420901 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1948923Z [rank1]:E1204 13:30:59.096000 420901 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1949222Z [rank1]:E1204 13:30:59.096000 420901 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1949346Z [rank1]:E1204 13:30:59.096000 420901 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1949659Z [rank1]:E1204 13:30:59.096000 420901 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1949810Z [rank1]:E1204 13:30:59.096000 420901 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1950088Z [rank1]:E1204 13:30:59.096000 420901 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1950255Z [rank1]:E1204 13:30:59.096000 420901 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1950531Z [rank1]:E1204 13:30:59.096000 420901 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1950672Z [rank1]:E1204 13:30:59.096000 420901 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1950950Z [rank1]:E1204 13:30:59.096000 420901 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1951103Z [rank1]:E1204 13:30:59.096000 420901 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1951591Z [rank1]:E1204 13:30:59.096000 420901 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 188928 on device 1. CUDA driver allocated memory was 2317352960 and is now 3875536896.
2025-12-04T13:38:32.1951720Z [rank1]:E1204 13:30:59.096000 420901 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1951917Z [rank1]:E1204 13:30:59.096000 420901 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1952286Z [rank1]:E1204 13:30:59.096000 420901 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda
2025-12-04T13:38:32.1952402Z [rank1]:E1204 13:30:59.096000 420901 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1952615Z [rank1]:E1204 13:30:59.096000 420901 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1952780Z [rank1]:E1204 13:30:59.096000 420901 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.1952822Z dist init r=1, world=4
2025-12-04T13:38:32.1953170Z [rank0]:[W1204 13:30:59.349308445 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1953214Z FAILED [9.5190s] [  7%]
2025-12-04T13:38:32.1953216Z 
2025-12-04T13:38:32.1953287Z =================================== FAILURES ===================================
2025-12-04T13:38:32.1953399Z _ TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda _
2025-12-04T13:38:32.1953446Z Traceback (most recent call last):
2025-12-04T13:38:32.1953614Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.1953658Z     self._join_processes(fn)
2025-12-04T13:38:32.1953836Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.1953891Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.1954074Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.1954119Z     raise RuntimeError(error)
2025-12-04T13:38:32.1954203Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.1954252Z Traceback (most recent call last):
2025-12-04T13:38:32.1954429Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1954474Z     getattr(self, test_name)()
2025-12-04T13:38:32.1954634Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1954674Z     fn()
2025-12-04T13:38:32.1954828Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1954874Z     method(*args, **kwargs)
2025-12-04T13:38:32.1955027Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1955071Z     method(*args, **kwargs)
2025-12-04T13:38:32.1955222Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1955266Z     with policy():
2025-12-04T13:38:32.1955420Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1955465Z     raise RuntimeError(msg)
2025-12-04T13:38:32.1955836Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 180736 on device 3. CUDA driver allocated memory was 2250244096 and is now 3808428032.
2025-12-04T13:38:32.1955839Z 
2025-12-04T13:38:32.1955918Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1956160Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda
2025-12-04T13:38:32.1956162Z 
2025-12-04T13:38:32.1956253Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1956256Z 
2025-12-04T13:38:32.1956258Z 
2025-12-04T13:38:32.1956338Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.1956426Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.1956665Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9b67bb4b5b795d1e.xml -
2025-12-04T13:38:32.1956727Z =========================== short test summary info ============================
2025-12-04T13:38:32.1957002Z FAILED [9.5190s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_no_shard_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.1957049Z Traceback (most recent call last):
2025-12-04T13:38:32.1957219Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1957273Z     getattr(self, test_name)()
2025-12-04T13:38:32.1957437Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1957474Z     fn()
2025-12-04T13:38:32.1957632Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1957673Z     method(*args, **kwargs)
2025-12-04T13:38:32.1957828Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1957868Z     method(*args, **kwargs)
2025-12-04T13:38:32.1958022Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1958060Z     with policy():
2025-12-04T13:38:32.1958216Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1958272Z     raise RuntimeError(msg)
2025-12-04T13:38:32.1958635Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 180736 on device 3. CUDA driver allocated memory was 2250244096 and is now 3808428032.
2025-12-04T13:38:32.1958637Z 
2025-12-04T13:38:32.1958717Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1958959Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda
2025-12-04T13:38:32.1958962Z 
2025-12-04T13:38:32.1959054Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1959118Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.1959186Z ======================= 1 failed, 20 deselected in 9.68s =======================
2025-12-04T13:38:32.1959226Z Got exit code 1
2025-12-04T13:38:32.1959272Z Retrying single test...
2025-12-04T13:38:32.1959463Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-09076c94ea287f91.xml
2025-12-04T13:38:32.1959540Z ============================= test session starts ==============================
2025-12-04T13:38:32.1959686Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.1959731Z cachedir: .pytest_cache
2025-12-04T13:38:32.1959892Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.1959943Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.1959984Z configfile: pytest.ini
2025-12-04T13:38:32.1960151Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.1960230Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.1960464Z stepcurrent: skipping 20 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_no_shard_cuda
2025-12-04T13:38:32.1960510Z Running 1 items in this shard
2025-12-04T13:38:32.1960513Z 
2025-12-04T13:38:32.1960827Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_no_shard_cuda I1204 13:31:03.493000 421233 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 421302
2025-12-04T13:38:32.1961000Z I1204 13:31:03.494000 421233 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 421303
2025-12-04T13:38:32.1961155Z I1204 13:31:03.495000 421233 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 421304
2025-12-04T13:38:32.1961325Z I1204 13:31:03.495000 421233 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 421305
2025-12-04T13:38:32.1961614Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:485: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1961668Z   return wrapper_cls(module, **kwargs)
2025-12-04T13:38:32.1962248Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1962302Z   _warn_cpu_init()
2025-12-04T13:38:32.1962589Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:485: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1962635Z   return wrapper_cls(module, **kwargs)
2025-12-04T13:38:32.1962917Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:485: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1962962Z   return wrapper_cls(module, **kwargs)
2025-12-04T13:38:32.1963536Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1963579Z   _warn_cpu_init()
2025-12-04T13:38:32.1964174Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1964215Z   _warn_cpu_init()
2025-12-04T13:38:32.1964505Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:532: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1964599Z   fsdp_model = FSDP(model, auto_wrap_policy=always_wrap_policy, **fsdp_kwargs)
2025-12-04T13:38:32.1964884Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:532: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1964973Z   fsdp_model = FSDP(model, auto_wrap_policy=always_wrap_policy, **fsdp_kwargs)
2025-12-04T13:38:32.1965260Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:532: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1965358Z   fsdp_model = FSDP(model, auto_wrap_policy=always_wrap_policy, **fsdp_kwargs)
2025-12-04T13:38:32.1965640Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:485: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1965697Z   return wrapper_cls(module, **kwargs)
2025-12-04T13:38:32.1966276Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.1966313Z   _warn_cpu_init()
2025-12-04T13:38:32.1966606Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:532: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.1966694Z   fsdp_model = FSDP(model, auto_wrap_policy=always_wrap_policy, **fsdp_kwargs)
2025-12-04T13:38:32.1966936Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1966983Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1967208Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1967255Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1967478Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1967523Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1967745Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.1967791Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1968012Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1968056Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1968289Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1968333Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1968556Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1968600Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1968823Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.1968866Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1969162Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.1969203Z   return func(*args, **kwargs)
2025-12-04T13:38:32.1970540Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.1970683Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.1971943Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.1972086Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.1973368Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.1973493Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.1974757Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.1974882Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.1975048Z [rank3]:E1204 13:31:11.015000 421305 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1975212Z [rank3]:E1204 13:31:11.015000 421305 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1975510Z [rank3]:E1204 13:31:11.015000 421305 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1975667Z [rank3]:E1204 13:31:11.015000 421305 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1975958Z [rank3]:E1204 13:31:11.015000 421305 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1976086Z [rank3]:E1204 13:31:11.015000 421305 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1976382Z [rank3]:E1204 13:31:11.015000 421305 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1976539Z [rank3]:E1204 13:31:11.015000 421305 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1976817Z [rank3]:E1204 13:31:11.015000 421305 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1976969Z [rank3]:E1204 13:31:11.015000 421305 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1977252Z [rank3]:E1204 13:31:11.015000 421305 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1977395Z [rank3]:E1204 13:31:11.015000 421305 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1977691Z [rank3]:E1204 13:31:11.015000 421305 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1977844Z [rank3]:E1204 13:31:11.015000 421305 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1978341Z [rank3]:E1204 13:31:11.015000 421305 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 188928 on device 3. CUDA driver allocated memory was 2250244096 and is now 3808428032.
2025-12-04T13:38:32.1978458Z [rank3]:E1204 13:31:11.015000 421305 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1978658Z [rank3]:E1204 13:31:11.015000 421305 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1979028Z [rank3]:E1204 13:31:11.015000 421305 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda
2025-12-04T13:38:32.1979159Z [rank3]:E1204 13:31:11.015000 421305 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1979380Z [rank3]:E1204 13:31:11.015000 421305 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1979557Z [rank3]:E1204 13:31:11.015000 421305 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.1979670Z dist init r=3, world=4
2025-12-04T13:38:32.1979809Z [rank0]:E1204 13:31:11.027000 421302 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1979973Z [rank0]:E1204 13:31:11.027000 421302 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1980264Z [rank0]:E1204 13:31:11.027000 421302 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1980422Z [rank0]:E1204 13:31:11.027000 421302 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1980726Z [rank0]:E1204 13:31:11.027000 421302 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1980853Z [rank0]:E1204 13:31:11.027000 421302 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1981135Z [rank0]:E1204 13:31:11.027000 421302 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1981283Z [rank0]:E1204 13:31:11.027000 421302 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1981565Z [rank0]:E1204 13:31:11.027000 421302 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1981714Z [rank0]:E1204 13:31:11.027000 421302 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1981993Z [rank0]:E1204 13:31:11.027000 421302 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1982147Z [rank0]:E1204 13:31:11.027000 421302 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1982430Z [rank0]:E1204 13:31:11.027000 421302 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1982582Z [rank0]:E1204 13:31:11.027000 421302 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1983075Z [rank0]:E1204 13:31:11.027000 421302 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 188928 on device 0. CUDA driver allocated memory was 2453667840 and is now 4011851776.
2025-12-04T13:38:32.1983194Z [rank0]:E1204 13:31:11.027000 421302 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1983391Z [rank0]:E1204 13:31:11.027000 421302 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1983778Z [rank0]:E1204 13:31:11.027000 421302 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda
2025-12-04T13:38:32.1983909Z [rank0]:E1204 13:31:11.027000 421302 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1984125Z [rank0]:E1204 13:31:11.027000 421302 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1984295Z [rank0]:E1204 13:31:11.027000 421302 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.1984334Z dist init r=0, world=4
2025-12-04T13:38:32.1984476Z [rank2]:E1204 13:31:11.071000 421304 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1984638Z [rank2]:E1204 13:31:11.071000 421304 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1984930Z [rank2]:E1204 13:31:11.071000 421304 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1985115Z [rank2]:E1204 13:31:11.071000 421304 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1985411Z [rank2]:E1204 13:31:11.071000 421304 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1985535Z [rank2]:E1204 13:31:11.071000 421304 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1985818Z [rank2]:E1204 13:31:11.071000 421304 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1985973Z [rank2]:E1204 13:31:11.071000 421304 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1986252Z [rank2]:E1204 13:31:11.071000 421304 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1986417Z [rank2]:E1204 13:31:11.071000 421304 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1986697Z [rank2]:E1204 13:31:11.071000 421304 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1986839Z [rank2]:E1204 13:31:11.071000 421304 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1987121Z [rank2]:E1204 13:31:11.071000 421304 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1987273Z [rank2]:E1204 13:31:11.071000 421304 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1987771Z [rank2]:E1204 13:31:11.071000 421304 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 184832 on device 2. CUDA driver allocated memory was 2300575744 and is now 3858759680.
2025-12-04T13:38:32.1987895Z [rank2]:E1204 13:31:11.071000 421304 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1988094Z [rank2]:E1204 13:31:11.071000 421304 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1988473Z [rank2]:E1204 13:31:11.071000 421304 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda
2025-12-04T13:38:32.1988594Z [rank2]:E1204 13:31:11.071000 421304 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1988810Z [rank2]:E1204 13:31:11.071000 421304 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1988983Z [rank2]:E1204 13:31:11.071000 421304 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.1989027Z dist init r=2, world=4
2025-12-04T13:38:32.1989170Z [rank1]:E1204 13:31:11.073000 421303 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.1989360Z [rank1]:E1204 13:31:11.073000 421303 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.1989687Z [rank1]:E1204 13:31:11.073000 421303 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1989851Z [rank1]:E1204 13:31:11.073000 421303 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.1990150Z [rank1]:E1204 13:31:11.073000 421303 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1990279Z [rank1]:E1204 13:31:11.073000 421303 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.1990565Z [rank1]:E1204 13:31:11.073000 421303 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1990720Z [rank1]:E1204 13:31:11.073000 421303 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1991023Z [rank1]:E1204 13:31:11.073000 421303 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1991176Z [rank1]:E1204 13:31:11.073000 421303 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.1991462Z [rank1]:E1204 13:31:11.073000 421303 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1991602Z [rank1]:E1204 13:31:11.073000 421303 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.1991891Z [rank1]:E1204 13:31:11.073000 421303 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1992042Z [rank1]:E1204 13:31:11.073000 421303 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.1992562Z [rank1]:E1204 13:31:11.073000 421303 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 188928 on device 1. CUDA driver allocated memory was 2317352960 and is now 3875536896.
2025-12-04T13:38:32.1992697Z [rank1]:E1204 13:31:11.073000 421303 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1992897Z [rank1]:E1204 13:31:11.073000 421303 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1993276Z [rank1]:E1204 13:31:11.073000 421303 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda
2025-12-04T13:38:32.1993393Z [rank1]:E1204 13:31:11.073000 421303 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.1993612Z [rank1]:E1204 13:31:11.073000 421303 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1993794Z [rank1]:E1204 13:31:11.073000 421303 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.1993837Z dist init r=1, world=4
2025-12-04T13:38:32.1994184Z [rank0]:[W1204 13:31:11.228304080 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.1994225Z FAILED [9.4192s] [100%]
2025-12-04T13:38:32.1994227Z 
2025-12-04T13:38:32.1994289Z =================================== FAILURES ===================================
2025-12-04T13:38:32.1994400Z _ TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda _
2025-12-04T13:38:32.1994452Z Traceback (most recent call last):
2025-12-04T13:38:32.1994623Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.1994673Z     self._join_processes(fn)
2025-12-04T13:38:32.1994854Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.1994914Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.1995108Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.1995159Z     raise RuntimeError(error)
2025-12-04T13:38:32.1995242Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.1995292Z Traceback (most recent call last):
2025-12-04T13:38:32.1995459Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1995507Z     getattr(self, test_name)()
2025-12-04T13:38:32.1995672Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1995713Z     fn()
2025-12-04T13:38:32.1995870Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1995915Z     method(*args, **kwargs)
2025-12-04T13:38:32.1996073Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1996118Z     method(*args, **kwargs)
2025-12-04T13:38:32.1996277Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1996316Z     with policy():
2025-12-04T13:38:32.1996489Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1996533Z     raise RuntimeError(msg)
2025-12-04T13:38:32.1996918Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 188928 on device 3. CUDA driver allocated memory was 2250244096 and is now 3808428032.
2025-12-04T13:38:32.1996922Z 
2025-12-04T13:38:32.1996999Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.1997251Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda
2025-12-04T13:38:32.1997254Z 
2025-12-04T13:38:32.1997344Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.1997347Z 
2025-12-04T13:38:32.1997349Z 
2025-12-04T13:38:32.1997430Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.1997523Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.1997776Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-09076c94ea287f91.xml -
2025-12-04T13:38:32.1997841Z =========================== short test summary info ============================
2025-12-04T13:38:32.1998103Z FAILED [9.4192s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_no_shard_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.1998155Z Traceback (most recent call last):
2025-12-04T13:38:32.1998324Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.1998374Z     getattr(self, test_name)()
2025-12-04T13:38:32.1998541Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.1998582Z     fn()
2025-12-04T13:38:32.1998745Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1998790Z     method(*args, **kwargs)
2025-12-04T13:38:32.1998952Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.1998997Z     method(*args, **kwargs)
2025-12-04T13:38:32.1999168Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.1999213Z     with policy():
2025-12-04T13:38:32.1999374Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.1999422Z     raise RuntimeError(msg)
2025-12-04T13:38:32.1999880Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 188928 on device 3. CUDA driver allocated memory was 2250244096 and is now 3808428032.
2025-12-04T13:38:32.1999887Z 
2025-12-04T13:38:32.1999966Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2000221Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda
2025-12-04T13:38:32.2000224Z 
2025-12-04T13:38:32.2000316Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2000387Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.2000469Z ======================= 1 failed, 32 deselected in 9.58s =======================
2025-12-04T13:38:32.2000515Z Got exit code 1
2025-12-04T13:38:32.2000559Z Retrying single test...
2025-12-04T13:38:32.2000763Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-b807e692337b1cb2.xml
2025-12-04T13:38:32.2000842Z ============================= test session starts ==============================
2025-12-04T13:38:32.2000965Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.2001010Z cachedir: .pytest_cache
2025-12-04T13:38:32.2001182Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.2001232Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.2001279Z configfile: pytest.ini
2025-12-04T13:38:32.2001455Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.2001537Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.2001783Z stepcurrent: skipping 20 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_no_shard_cuda
2025-12-04T13:38:32.2001850Z Running 1 items in this shard
2025-12-04T13:38:32.2001853Z 
2025-12-04T13:38:32.2002189Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_no_shard_cuda I1204 13:31:15.477000 421635 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 421704
2025-12-04T13:38:32.2002355Z I1204 13:31:15.478000 421635 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 421705
2025-12-04T13:38:32.2002519Z I1204 13:31:15.478000 421635 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 421706
2025-12-04T13:38:32.2002681Z I1204 13:31:15.479000 421635 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 421707
2025-12-04T13:38:32.2002988Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:485: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.2003040Z   return wrapper_cls(module, **kwargs)
2025-12-04T13:38:32.2003670Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2003714Z   _warn_cpu_init()
2025-12-04T13:38:32.2004019Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:532: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.2004117Z   fsdp_model = FSDP(model, auto_wrap_policy=always_wrap_policy, **fsdp_kwargs)
2025-12-04T13:38:32.2004414Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:485: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.2004466Z   return wrapper_cls(module, **kwargs)
2025-12-04T13:38:32.2004762Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:485: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.2004814Z   return wrapper_cls(module, **kwargs)
2025-12-04T13:38:32.2005122Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:485: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.2005190Z   return wrapper_cls(module, **kwargs)
2025-12-04T13:38:32.2005809Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2005849Z   _warn_cpu_init()
2025-12-04T13:38:32.2006453Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2006505Z   _warn_cpu_init()
2025-12-04T13:38:32.2007108Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2007152Z   _warn_cpu_init()
2025-12-04T13:38:32.2007460Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:532: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.2007556Z   fsdp_model = FSDP(model, auto_wrap_policy=always_wrap_policy, **fsdp_kwargs)
2025-12-04T13:38:32.2007859Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:532: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.2007953Z   fsdp_model = FSDP(model, auto_wrap_policy=always_wrap_policy, **fsdp_kwargs)
2025-12-04T13:38:32.2008265Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:532: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.2008358Z   fsdp_model = FSDP(model, auto_wrap_policy=always_wrap_policy, **fsdp_kwargs)
2025-12-04T13:38:32.2008603Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.2008653Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2008900Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.2008947Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2009192Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.2009238Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2009480Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.2009525Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2009843Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.2009902Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2010145Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.2010189Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2010430Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.2010474Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2010716Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.2010761Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2011081Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.2011145Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2012529Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.2012671Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.2014049Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.2014186Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.2015554Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.2015702Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.2017066Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.)
2025-12-04T13:38:32.2017210Z   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2025-12-04T13:38:32.2017367Z [rank1]:E1204 13:31:23.001000 421705 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2017547Z [rank1]:E1204 13:31:23.001000 421705 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2017866Z [rank1]:E1204 13:31:23.001000 421705 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2018038Z [rank1]:E1204 13:31:23.001000 421705 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2018360Z [rank1]:E1204 13:31:23.001000 421705 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2018499Z [rank1]:E1204 13:31:23.001000 421705 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2018802Z [rank1]:E1204 13:31:23.001000 421705 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2018969Z [rank1]:E1204 13:31:23.001000 421705 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2019273Z [rank1]:E1204 13:31:23.001000 421705 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2019435Z [rank1]:E1204 13:31:23.001000 421705 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2019796Z [rank1]:E1204 13:31:23.001000 421705 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2019945Z [rank1]:E1204 13:31:23.001000 421705 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2020272Z [rank1]:E1204 13:31:23.001000 421705 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2020438Z [rank1]:E1204 13:31:23.001000 421705 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2020979Z [rank1]:E1204 13:31:23.001000 421705 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 176640 on device 1. CUDA driver allocated memory was 2317352960 and is now 3875536896.
2025-12-04T13:38:32.2021108Z [rank1]:E1204 13:31:23.001000 421705 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2021322Z [rank1]:E1204 13:31:23.001000 421705 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2021744Z [rank1]:E1204 13:31:23.001000 421705 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda
2025-12-04T13:38:32.2021868Z [rank1]:E1204 13:31:23.001000 421705 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2022104Z [rank1]:E1204 13:31:23.001000 421705 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2022288Z [rank1]:E1204 13:31:23.001000 421705 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.2022332Z dist init r=1, world=4
2025-12-04T13:38:32.2022485Z [rank0]:E1204 13:31:23.014000 421704 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2022660Z [rank0]:E1204 13:31:23.014000 421704 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2022994Z [rank0]:E1204 13:31:23.014000 421704 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2023162Z [rank0]:E1204 13:31:23.014000 421704 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2023481Z [rank0]:E1204 13:31:23.014000 421704 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2023615Z [rank0]:E1204 13:31:23.014000 421704 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2023921Z [rank0]:E1204 13:31:23.014000 421704 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2024086Z [rank0]:E1204 13:31:23.014000 421704 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2024386Z [rank0]:E1204 13:31:23.014000 421704 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2024567Z [rank0]:E1204 13:31:23.014000 421704 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2024868Z [rank0]:E1204 13:31:23.014000 421704 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2025034Z [rank0]:E1204 13:31:23.014000 421704 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2025339Z [rank0]:E1204 13:31:23.014000 421704 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2025504Z [rank0]:E1204 13:31:23.014000 421704 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2026038Z [rank0]:E1204 13:31:23.014000 421704 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 172544 on device 0. CUDA driver allocated memory was 2453667840 and is now 4011851776.
2025-12-04T13:38:32.2026175Z [rank0]:E1204 13:31:23.014000 421704 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2026392Z [rank0]:E1204 13:31:23.014000 421704 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2026793Z [rank0]:E1204 13:31:23.014000 421704 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda
2025-12-04T13:38:32.2026919Z [rank0]:E1204 13:31:23.014000 421704 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2027149Z [rank0]:E1204 13:31:23.014000 421704 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2027335Z [rank0]:E1204 13:31:23.014000 421704 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.2027378Z dist init r=0, world=4
2025-12-04T13:38:32.2027530Z [rank3]:E1204 13:31:23.058000 421707 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2027717Z [rank3]:E1204 13:31:23.058000 421707 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2028033Z [rank3]:E1204 13:31:23.058000 421707 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2028204Z [rank3]:E1204 13:31:23.058000 421707 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2028514Z [rank3]:E1204 13:31:23.058000 421707 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2028650Z [rank3]:E1204 13:31:23.058000 421707 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2028951Z [rank3]:E1204 13:31:23.058000 421707 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2029125Z [rank3]:E1204 13:31:23.058000 421707 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2029428Z [rank3]:E1204 13:31:23.058000 421707 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2029619Z [rank3]:E1204 13:31:23.058000 421707 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2029922Z [rank3]:E1204 13:31:23.058000 421707 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2030069Z [rank3]:E1204 13:31:23.058000 421707 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2030374Z [rank3]:E1204 13:31:23.058000 421707 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2030535Z [rank3]:E1204 13:31:23.058000 421707 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2031081Z [rank3]:E1204 13:31:23.058000 421707 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 172544 on device 3. CUDA driver allocated memory was 2250244096 and is now 3808428032.
2025-12-04T13:38:32.2031207Z [rank3]:E1204 13:31:23.058000 421707 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2031419Z [rank3]:E1204 13:31:23.058000 421707 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2031815Z [rank3]:E1204 13:31:23.058000 421707 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda
2025-12-04T13:38:32.2031939Z [rank3]:E1204 13:31:23.058000 421707 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2032172Z [rank3]:E1204 13:31:23.058000 421707 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2032366Z [rank3]:E1204 13:31:23.058000 421707 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.2032411Z dist init r=3, world=4
2025-12-04T13:38:32.2032560Z [rank2]:E1204 13:31:23.078000 421706 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2032737Z [rank2]:E1204 13:31:23.078000 421706 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2033052Z [rank2]:E1204 13:31:23.078000 421706 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2033220Z [rank2]:E1204 13:31:23.078000 421706 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2033531Z [rank2]:E1204 13:31:23.078000 421706 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2033663Z [rank2]:E1204 13:31:23.078000 421706 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2033979Z [rank2]:E1204 13:31:23.078000 421706 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2034154Z [rank2]:E1204 13:31:23.078000 421706 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2034454Z [rank2]:E1204 13:31:23.078000 421706 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2034617Z [rank2]:E1204 13:31:23.078000 421706 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2034919Z [rank2]:E1204 13:31:23.078000 421706 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2035069Z [rank2]:E1204 13:31:23.078000 421706 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2035370Z [rank2]:E1204 13:31:23.078000 421706 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2035545Z [rank2]:E1204 13:31:23.078000 421706 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2036068Z [rank2]:E1204 13:31:23.078000 421706 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 188928 on device 2. CUDA driver allocated memory was 2300575744 and is now 3858759680.
2025-12-04T13:38:32.2036195Z [rank2]:E1204 13:31:23.078000 421706 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2036409Z [rank2]:E1204 13:31:23.078000 421706 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2036810Z [rank2]:E1204 13:31:23.078000 421706 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda
2025-12-04T13:38:32.2036947Z [rank2]:E1204 13:31:23.078000 421706 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2037175Z [rank2]:E1204 13:31:23.078000 421706 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2037357Z [rank2]:E1204 13:31:23.078000 421706 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.2037399Z dist init r=2, world=4
2025-12-04T13:38:32.2037767Z [rank0]:[W1204 13:31:23.200657596 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.2037814Z FAILED [9.4200s] [100%]
2025-12-04T13:38:32.2037816Z 
2025-12-04T13:38:32.2037878Z =================================== FAILURES ===================================
2025-12-04T13:38:32.2037997Z _ TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda _
2025-12-04T13:38:32.2038047Z Traceback (most recent call last):
2025-12-04T13:38:32.2038228Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.2038287Z     self._join_processes(fn)
2025-12-04T13:38:32.2038479Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.2038552Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.2038748Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.2038797Z     raise RuntimeError(error)
2025-12-04T13:38:32.2038886Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T13:38:32.2038935Z Traceback (most recent call last):
2025-12-04T13:38:32.2039116Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2039162Z     getattr(self, test_name)()
2025-12-04T13:38:32.2039338Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2039376Z     fn()
2025-12-04T13:38:32.2039547Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2039647Z     method(*args, **kwargs)
2025-12-04T13:38:32.2039817Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2039862Z     method(*args, **kwargs)
2025-12-04T13:38:32.2040031Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2040071Z     with policy():
2025-12-04T13:38:32.2040245Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2040292Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2040687Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 176640 on device 1. CUDA driver allocated memory was 2317352960 and is now 3875536896.
2025-12-04T13:38:32.2040691Z 
2025-12-04T13:38:32.2040777Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2041040Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda
2025-12-04T13:38:32.2041043Z 
2025-12-04T13:38:32.2041142Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2041166Z 
2025-12-04T13:38:32.2041168Z 
2025-12-04T13:38:32.2041252Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.2041350Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.2041604Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-b807e692337b1cb2.xml -
2025-12-04T13:38:32.2041675Z =========================== short test summary info ============================
2025-12-04T13:38:32.2041957Z FAILED [9.4200s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_no_shard_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T13:38:32.2042007Z Traceback (most recent call last):
2025-12-04T13:38:32.2042186Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2042234Z     getattr(self, test_name)()
2025-12-04T13:38:32.2042411Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2042449Z     fn()
2025-12-04T13:38:32.2042633Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2042677Z     method(*args, **kwargs)
2025-12-04T13:38:32.2042845Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2042903Z     method(*args, **kwargs)
2025-12-04T13:38:32.2043072Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2043112Z     with policy():
2025-12-04T13:38:32.2043281Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2043328Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2043727Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 176640 on device 1. CUDA driver allocated memory was 2317352960 and is now 3875536896.
2025-12-04T13:38:32.2043729Z 
2025-12-04T13:38:32.2043810Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2044075Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_no_shard_cuda
2025-12-04T13:38:32.2044091Z 
2025-12-04T13:38:32.2044188Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2044257Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.2044329Z ======================= 1 failed, 32 deselected in 9.58s =======================
2025-12-04T13:38:32.2044370Z Got exit code 1
2025-12-04T13:38:32.2044577Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_no_shard_cuda
2025-12-04T13:38:32.2044716Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T13:38:32.2044923Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-65f9ef345ca7c3f2.xml
2025-12-04T13:38:32.2044987Z ============================= test session starts ==============================
2025-12-04T13:38:32.2045116Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.2045161Z cachedir: .pytest_cache
2025-12-04T13:38:32.2045337Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.2045399Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.2045448Z configfile: pytest.ini
2025-12-04T13:38:32.2045625Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.2045709Z collecting ... collected 60 items / 21 deselected / 39 selected
2025-12-04T13:38:32.2045767Z stepcurrent: skipping 21 already run items.
2025-12-04T13:38:32.2045817Z Running 12 items in this shard
2025-12-04T13:38:32.2045820Z 
2025-12-04T13:38:32.2046173Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_shard_grad_op_cuda I1204 13:31:27.411000 422037 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 422106
2025-12-04T13:38:32.2046341Z I1204 13:31:27.411000 422037 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 422107
2025-12-04T13:38:32.2046513Z I1204 13:31:27.412000 422037 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 422108
2025-12-04T13:38:32.2046676Z I1204 13:31:27.412000 422037 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 422109
2025-12-04T13:38:32.2047317Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2047370Z   _warn_cpu_init()
2025-12-04T13:38:32.2047995Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2048039Z   _warn_cpu_init()
2025-12-04T13:38:32.2048651Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2048708Z   _warn_cpu_init()
2025-12-04T13:38:32.2049324Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2049369Z   _warn_cpu_init()
2025-12-04T13:38:32.2049736Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.2049784Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2049941Z [rank0]:E1204 13:31:35.106000 422106 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2050131Z [rank0]:E1204 13:31:35.106000 422106 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2050451Z [rank0]:E1204 13:31:35.106000 422106 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2050617Z [rank0]:E1204 13:31:35.106000 422106 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2050929Z [rank0]:E1204 13:31:35.106000 422106 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2051066Z [rank0]:E1204 13:31:35.106000 422106 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2051370Z [rank0]:E1204 13:31:35.106000 422106 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2051534Z [rank0]:E1204 13:31:35.106000 422106 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2051847Z [rank0]:E1204 13:31:35.106000 422106 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2052023Z [rank0]:E1204 13:31:35.106000 422106 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2052321Z [rank0]:E1204 13:31:35.106000 422106 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2052473Z [rank0]:E1204 13:31:35.106000 422106 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2052775Z [rank0]:E1204 13:31:35.106000 422106 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2052940Z [rank0]:E1204 13:31:35.106000 422106 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2053479Z [rank0]:E1204 13:31:35.106000 422106 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 66048 on device 0. CUDA driver allocated memory was 2453667840 and is now 3986685952.
2025-12-04T13:38:32.2053619Z [rank0]:E1204 13:31:35.106000 422106 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2053833Z [rank0]:E1204 13:31:35.106000 422106 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2054238Z [rank0]:E1204 13:31:35.106000 422106 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2054366Z [rank0]:E1204 13:31:35.106000 422106 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2054600Z [rank0]:E1204 13:31:35.106000 422106 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2054789Z [rank0]:E1204 13:31:35.106000 422106 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.2054836Z dist init r=0, world=4
2025-12-04T13:38:32.2054983Z [rank2]:E1204 13:31:35.108000 422108 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2055160Z [rank2]:E1204 13:31:35.108000 422108 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2055473Z [rank2]:E1204 13:31:35.108000 422108 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2055645Z [rank2]:E1204 13:31:35.108000 422108 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2055956Z [rank2]:E1204 13:31:35.108000 422108 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2056093Z [rank2]:E1204 13:31:35.108000 422108 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2056408Z [rank2]:E1204 13:31:35.108000 422108 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2056568Z [rank2]:E1204 13:31:35.108000 422108 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2056879Z [rank2]:E1204 13:31:35.108000 422108 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2057038Z [rank2]:E1204 13:31:35.108000 422108 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2057339Z [rank2]:E1204 13:31:35.108000 422108 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2057486Z [rank2]:E1204 13:31:35.108000 422108 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2057789Z [rank2]:E1204 13:31:35.108000 422108 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2057970Z [rank2]:E1204 13:31:35.108000 422108 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2058504Z [rank2]:E1204 13:31:35.108000 422108 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 2. CUDA driver allocated memory was 2300575744 and is now 3833593856.
2025-12-04T13:38:32.2058631Z [rank2]:E1204 13:31:35.108000 422108 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2058841Z [rank2]:E1204 13:31:35.108000 422108 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2059252Z [rank2]:E1204 13:31:35.108000 422108 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2059375Z [rank2]:E1204 13:31:35.108000 422108 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2059663Z [rank2]:E1204 13:31:35.108000 422108 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2059846Z [rank2]:E1204 13:31:35.108000 422108 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.2059888Z dist init r=2, world=4
2025-12-04T13:38:32.2060039Z [rank3]:E1204 13:31:35.118000 422109 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2060213Z [rank3]:E1204 13:31:35.118000 422109 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2060530Z [rank3]:E1204 13:31:35.118000 422109 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2060696Z [rank3]:E1204 13:31:35.118000 422109 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2061018Z [rank3]:E1204 13:31:35.118000 422109 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2061153Z [rank3]:E1204 13:31:35.118000 422109 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2061469Z [rank3]:E1204 13:31:35.118000 422109 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2061630Z [rank3]:E1204 13:31:35.118000 422109 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2061928Z [rank3]:E1204 13:31:35.118000 422109 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2062090Z [rank3]:E1204 13:31:35.118000 422109 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2062390Z [rank3]:E1204 13:31:35.118000 422109 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2062553Z [rank3]:E1204 13:31:35.118000 422109 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2062852Z [rank3]:E1204 13:31:35.118000 422109 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2063017Z [rank3]:E1204 13:31:35.118000 422109 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2063552Z [rank3]:E1204 13:31:35.118000 422109 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 3. CUDA driver allocated memory was 2250244096 and is now 3783262208.
2025-12-04T13:38:32.2063676Z [rank3]:E1204 13:31:35.118000 422109 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2063891Z [rank3]:E1204 13:31:35.118000 422109 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2064304Z [rank3]:E1204 13:31:35.118000 422109 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2064430Z [rank3]:E1204 13:31:35.118000 422109 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2064658Z [rank3]:E1204 13:31:35.118000 422109 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2064839Z [rank3]:E1204 13:31:35.118000 422109 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.2064886Z dist init r=3, world=4
2025-12-04T13:38:32.2065034Z [rank1]:E1204 13:31:35.161000 422107 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2065211Z [rank1]:E1204 13:31:35.161000 422107 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2065527Z [rank1]:E1204 13:31:35.161000 422107 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2065710Z [rank1]:E1204 13:31:35.161000 422107 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2066016Z [rank1]:E1204 13:31:35.161000 422107 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2066164Z [rank1]:E1204 13:31:35.161000 422107 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2066464Z [rank1]:E1204 13:31:35.161000 422107 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2066625Z [rank1]:E1204 13:31:35.161000 422107 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2066927Z [rank1]:E1204 13:31:35.161000 422107 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2067086Z [rank1]:E1204 13:31:35.161000 422107 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2067397Z [rank1]:E1204 13:31:35.161000 422107 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2067544Z [rank1]:E1204 13:31:35.161000 422107 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2067847Z [rank1]:E1204 13:31:35.161000 422107 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2068009Z [rank1]:E1204 13:31:35.161000 422107 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2068540Z [rank1]:E1204 13:31:35.161000 422107 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 70144 on device 1. CUDA driver allocated memory was 2317352960 and is now 3850371072.
2025-12-04T13:38:32.2068678Z [rank1]:E1204 13:31:35.161000 422107 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2068888Z [rank1]:E1204 13:31:35.161000 422107 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2069291Z [rank1]:E1204 13:31:35.161000 422107 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2069415Z [rank1]:E1204 13:31:35.161000 422107 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2069675Z [rank1]:E1204 13:31:35.161000 422107 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2069853Z [rank1]:E1204 13:31:35.161000 422107 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.2069899Z dist init r=1, world=4
2025-12-04T13:38:32.2070280Z [rank0]:[W1204 13:31:35.273776708 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.2070324Z FAILED [9.5204s] [  8%]
2025-12-04T13:38:32.2070326Z 
2025-12-04T13:38:32.2070389Z =================================== FAILURES ===================================
2025-12-04T13:38:32.2070526Z _ TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda _
2025-12-04T13:38:32.2070580Z Traceback (most recent call last):
2025-12-04T13:38:32.2070757Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.2070807Z     self._join_processes(fn)
2025-12-04T13:38:32.2070994Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.2071056Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.2071250Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.2071301Z     raise RuntimeError(error)
2025-12-04T13:38:32.2071387Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.2071442Z Traceback (most recent call last):
2025-12-04T13:38:32.2071630Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2071678Z     getattr(self, test_name)()
2025-12-04T13:38:32.2071851Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2071891Z     fn()
2025-12-04T13:38:32.2072058Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2072106Z     method(*args, **kwargs)
2025-12-04T13:38:32.2072271Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2072318Z     method(*args, **kwargs)
2025-12-04T13:38:32.2072485Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2072528Z     with policy():
2025-12-04T13:38:32.2072699Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2072744Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2073163Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 66048 on device 0. CUDA driver allocated memory was 2453667840 and is now 3986685952.
2025-12-04T13:38:32.2073165Z 
2025-12-04T13:38:32.2073248Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2073520Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2073522Z 
2025-12-04T13:38:32.2073619Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2073622Z 
2025-12-04T13:38:32.2073624Z 
2025-12-04T13:38:32.2073708Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.2073805Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.2074060Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-65f9ef345ca7c3f2.xml -
2025-12-04T13:38:32.2074128Z =========================== short test summary info ============================
2025-12-04T13:38:32.2074423Z FAILED [9.5204s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_shard_grad_op_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.2074476Z Traceback (most recent call last):
2025-12-04T13:38:32.2074653Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2074722Z     getattr(self, test_name)()
2025-12-04T13:38:32.2074895Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2074935Z     fn()
2025-12-04T13:38:32.2075099Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2075144Z     method(*args, **kwargs)
2025-12-04T13:38:32.2075308Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2075352Z     method(*args, **kwargs)
2025-12-04T13:38:32.2075518Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2075560Z     with policy():
2025-12-04T13:38:32.2075725Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2075785Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2076178Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 66048 on device 0. CUDA driver allocated memory was 2453667840 and is now 3986685952.
2025-12-04T13:38:32.2076183Z 
2025-12-04T13:38:32.2076263Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2076529Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2076532Z 
2025-12-04T13:38:32.2076624Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2076693Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.2076760Z ======================= 1 failed, 21 deselected in 9.68s =======================
2025-12-04T13:38:32.2076801Z Got exit code 1
2025-12-04T13:38:32.2076843Z Retrying single test...
2025-12-04T13:38:32.2077049Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3b513d8e2c5d71c6.xml
2025-12-04T13:38:32.2077121Z ============================= test session starts ==============================
2025-12-04T13:38:32.2077246Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.2077289Z cachedir: .pytest_cache
2025-12-04T13:38:32.2077462Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.2077511Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.2077556Z configfile: pytest.ini
2025-12-04T13:38:32.2077731Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.2077813Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.2078074Z stepcurrent: skipping 21 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2078122Z Running 1 items in this shard
2025-12-04T13:38:32.2078126Z 
2025-12-04T13:38:32.2078477Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_shard_grad_op_cuda I1204 13:31:39.430000 422439 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 422508
2025-12-04T13:38:32.2078654Z I1204 13:31:39.431000 422439 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 422509
2025-12-04T13:38:32.2078820Z I1204 13:31:39.432000 422439 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 422510
2025-12-04T13:38:32.2078993Z I1204 13:31:39.432000 422439 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 422511
2025-12-04T13:38:32.2079707Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2079749Z   _warn_cpu_init()
2025-12-04T13:38:32.2080366Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2080423Z   _warn_cpu_init()
2025-12-04T13:38:32.2081040Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2081082Z   _warn_cpu_init()
2025-12-04T13:38:32.2081699Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2081739Z   _warn_cpu_init()
2025-12-04T13:38:32.2082068Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.2082114Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2082270Z [rank1]:E1204 13:31:47.225000 422509 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2082444Z [rank1]:E1204 13:31:47.225000 422509 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2082760Z [rank1]:E1204 13:31:47.225000 422509 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2082927Z [rank1]:E1204 13:31:47.225000 422509 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2083240Z [rank1]:E1204 13:31:47.225000 422509 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2083394Z [rank1]:E1204 13:31:47.225000 422509 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2083695Z [rank1]:E1204 13:31:47.225000 422509 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2083870Z [rank1]:E1204 13:31:47.225000 422509 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2084168Z [rank1]:E1204 13:31:47.225000 422509 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2084329Z [rank1]:E1204 13:31:47.225000 422509 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2084626Z [rank1]:E1204 13:31:47.225000 422509 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2084774Z [rank1]:E1204 13:31:47.225000 422509 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2085086Z [rank1]:E1204 13:31:47.225000 422509 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2085244Z [rank1]:E1204 13:31:47.225000 422509 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2085778Z [rank1]:E1204 13:31:47.225000 422509 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 1. CUDA driver allocated memory was 2317352960 and is now 3850371072.
2025-12-04T13:38:32.2085901Z [rank1]:E1204 13:31:47.225000 422509 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2086114Z [rank1]:E1204 13:31:47.225000 422509 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2086529Z [rank1]:E1204 13:31:47.225000 422509 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2086653Z [rank1]:E1204 13:31:47.225000 422509 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2086883Z [rank1]:E1204 13:31:47.225000 422509 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2087058Z [rank1]:E1204 13:31:47.225000 422509 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.2087104Z dist init r=1, world=4
2025-12-04T13:38:32.2087251Z [rank3]:E1204 13:31:47.231000 422511 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2087424Z [rank3]:E1204 13:31:47.231000 422511 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2087732Z [rank3]:E1204 13:31:47.231000 422511 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2087899Z [rank3]:E1204 13:31:47.231000 422511 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2088221Z [rank3]:E1204 13:31:47.231000 422511 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2088368Z [rank3]:E1204 13:31:47.231000 422511 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2088670Z [rank3]:E1204 13:31:47.231000 422511 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2088828Z [rank3]:E1204 13:31:47.231000 422511 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2089128Z [rank3]:E1204 13:31:47.231000 422511 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2089285Z [rank3]:E1204 13:31:47.231000 422511 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2089614Z [rank3]:E1204 13:31:47.231000 422511 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2089774Z [rank3]:E1204 13:31:47.231000 422511 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2090075Z [rank3]:E1204 13:31:47.231000 422511 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2090235Z [rank3]:E1204 13:31:47.231000 422511 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2090764Z [rank3]:E1204 13:31:47.231000 422511 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 68096 on device 3. CUDA driver allocated memory was 2250244096 and is now 3783262208.
2025-12-04T13:38:32.2090890Z [rank3]:E1204 13:31:47.231000 422511 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2091113Z [rank3]:E1204 13:31:47.231000 422511 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2091516Z [rank3]:E1204 13:31:47.231000 422511 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2091639Z [rank3]:E1204 13:31:47.231000 422511 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2091869Z [rank3]:E1204 13:31:47.231000 422511 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2092047Z [rank3]:E1204 13:31:47.231000 422511 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.2092087Z dist init r=3, world=4
2025-12-04T13:38:32.2092237Z [rank0]:E1204 13:31:47.238000 422508 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2092408Z [rank0]:E1204 13:31:47.238000 422508 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2092740Z [rank0]:E1204 13:31:47.238000 422508 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2092906Z [rank0]:E1204 13:31:47.238000 422508 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2093227Z [rank0]:E1204 13:31:47.238000 422508 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2093359Z [rank0]:E1204 13:31:47.238000 422508 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2093659Z [rank0]:E1204 13:31:47.238000 422508 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2093819Z [rank0]:E1204 13:31:47.238000 422508 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2094114Z [rank0]:E1204 13:31:47.238000 422508 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2094286Z [rank0]:E1204 13:31:47.238000 422508 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2094584Z [rank0]:E1204 13:31:47.238000 422508 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2094732Z [rank0]:E1204 13:31:47.238000 422508 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2095031Z [rank0]:E1204 13:31:47.238000 422508 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2095193Z [rank0]:E1204 13:31:47.238000 422508 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2095738Z [rank0]:E1204 13:31:47.238000 422508 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 70144 on device 0. CUDA driver allocated memory was 2453667840 and is now 3986685952.
2025-12-04T13:38:32.2095861Z [rank0]:E1204 13:31:47.238000 422508 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2096075Z [rank0]:E1204 13:31:47.238000 422508 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2096477Z [rank0]:E1204 13:31:47.238000 422508 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2096600Z [rank0]:E1204 13:31:47.238000 422508 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2096829Z [rank0]:E1204 13:31:47.238000 422508 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2097004Z [rank0]:E1204 13:31:47.238000 422508 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.2097047Z dist init r=0, world=4
2025-12-04T13:38:32.2097206Z [rank2]:E1204 13:31:47.252000 422510 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2097381Z [rank2]:E1204 13:31:47.252000 422510 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2097702Z [rank2]:E1204 13:31:47.252000 422510 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2097867Z [rank2]:E1204 13:31:47.252000 422510 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2098173Z [rank2]:E1204 13:31:47.252000 422510 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2098308Z [rank2]:E1204 13:31:47.252000 422510 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2098611Z [rank2]:E1204 13:31:47.252000 422510 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2098782Z [rank2]:E1204 13:31:47.252000 422510 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2099081Z [rank2]:E1204 13:31:47.252000 422510 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2099239Z [rank2]:E1204 13:31:47.252000 422510 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2099537Z [rank2]:E1204 13:31:47.252000 422510 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2099719Z [rank2]:E1204 13:31:47.252000 422510 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2100021Z [rank2]:E1204 13:31:47.252000 422510 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2100182Z [rank2]:E1204 13:31:47.252000 422510 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2100730Z [rank2]:E1204 13:31:47.252000 422510 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 74240 on device 2. CUDA driver allocated memory was 2300575744 and is now 3833593856.
2025-12-04T13:38:32.2100853Z [rank2]:E1204 13:31:47.252000 422510 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2101065Z [rank2]:E1204 13:31:47.252000 422510 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2101469Z [rank2]:E1204 13:31:47.252000 422510 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2101590Z [rank2]:E1204 13:31:47.252000 422510 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2101836Z [rank2]:E1204 13:31:47.252000 422510 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2102014Z [rank2]:E1204 13:31:47.252000 422510 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.2102071Z dist init r=2, world=4
2025-12-04T13:38:32.2102434Z [rank0]:[W1204 13:31:47.473401276 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.2102475Z FAILED [9.6209s] [100%]
2025-12-04T13:38:32.2102478Z 
2025-12-04T13:38:32.2102543Z =================================== FAILURES ===================================
2025-12-04T13:38:32.2102663Z _ TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda _
2025-12-04T13:38:32.2102713Z Traceback (most recent call last):
2025-12-04T13:38:32.2102890Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.2102937Z     self._join_processes(fn)
2025-12-04T13:38:32.2103122Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.2103202Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.2103393Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.2103442Z     raise RuntimeError(error)
2025-12-04T13:38:32.2103525Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.2103577Z Traceback (most recent call last):
2025-12-04T13:38:32.2103751Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2103799Z     getattr(self, test_name)()
2025-12-04T13:38:32.2103971Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2104010Z     fn()
2025-12-04T13:38:32.2104175Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2104222Z     method(*args, **kwargs)
2025-12-04T13:38:32.2104386Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2104428Z     method(*args, **kwargs)
2025-12-04T13:38:32.2104603Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2104643Z     with policy():
2025-12-04T13:38:32.2104810Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2104853Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2105248Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 68096 on device 3. CUDA driver allocated memory was 2250244096 and is now 3783262208.
2025-12-04T13:38:32.2105252Z 
2025-12-04T13:38:32.2105333Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2105601Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2105603Z 
2025-12-04T13:38:32.2105698Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2105702Z 
2025-12-04T13:38:32.2105704Z 
2025-12-04T13:38:32.2105786Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.2105892Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.2106146Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3b513d8e2c5d71c6.xml -
2025-12-04T13:38:32.2106225Z =========================== short test summary info ============================
2025-12-04T13:38:32.2106507Z FAILED [9.6209s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_shard_grad_op_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.2106558Z Traceback (most recent call last):
2025-12-04T13:38:32.2106737Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2106784Z     getattr(self, test_name)()
2025-12-04T13:38:32.2106957Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2106997Z     fn()
2025-12-04T13:38:32.2107162Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2107208Z     method(*args, **kwargs)
2025-12-04T13:38:32.2107383Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2107427Z     method(*args, **kwargs)
2025-12-04T13:38:32.2107591Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2107632Z     with policy():
2025-12-04T13:38:32.2107797Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2107842Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2108242Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 68096 on device 3. CUDA driver allocated memory was 2250244096 and is now 3783262208.
2025-12-04T13:38:32.2108245Z 
2025-12-04T13:38:32.2108325Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2108598Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2108600Z 
2025-12-04T13:38:32.2108694Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2108783Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.2108849Z ======================= 1 failed, 32 deselected in 9.78s =======================
2025-12-04T13:38:32.2108891Z Got exit code 1
2025-12-04T13:38:32.2108933Z Retrying single test...
2025-12-04T13:38:32.2109139Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-aa44c67e35bd4a1b.xml
2025-12-04T13:38:32.2109201Z ============================= test session starts ==============================
2025-12-04T13:38:32.2109328Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.2109373Z cachedir: .pytest_cache
2025-12-04T13:38:32.2109544Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.2109641Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.2109687Z configfile: pytest.ini
2025-12-04T13:38:32.2109864Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.2109943Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.2110229Z stepcurrent: skipping 21 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2110275Z Running 1 items in this shard
2025-12-04T13:38:32.2110294Z 
2025-12-04T13:38:32.2110641Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_shard_grad_op_cuda I1204 13:31:51.615000 422841 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 422910
2025-12-04T13:38:32.2110808Z I1204 13:31:51.616000 422841 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 422911
2025-12-04T13:38:32.2110977Z I1204 13:31:51.616000 422841 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 422912
2025-12-04T13:38:32.2111139Z I1204 13:31:51.617000 422841 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 422913
2025-12-04T13:38:32.2111957Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2112019Z   _warn_cpu_init()
2025-12-04T13:38:32.2112638Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2112681Z   _warn_cpu_init()
2025-12-04T13:38:32.2113293Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2113336Z   _warn_cpu_init()
2025-12-04T13:38:32.2113979Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2114018Z   _warn_cpu_init()
2025-12-04T13:38:32.2114333Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.2114380Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2114534Z [rank2]:E1204 13:31:59.351000 422912 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2114709Z [rank2]:E1204 13:31:59.351000 422912 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2115027Z [rank2]:E1204 13:31:59.351000 422912 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2115206Z [rank2]:E1204 13:31:59.351000 422912 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2115514Z [rank2]:E1204 13:31:59.351000 422912 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2115661Z [rank2]:E1204 13:31:59.351000 422912 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2115960Z [rank2]:E1204 13:31:59.351000 422912 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2116123Z [rank2]:E1204 13:31:59.351000 422912 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2116423Z [rank2]:E1204 13:31:59.351000 422912 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2116584Z [rank2]:E1204 13:31:59.351000 422912 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2116891Z [rank2]:E1204 13:31:59.351000 422912 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2117039Z [rank2]:E1204 13:31:59.351000 422912 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2117338Z [rank2]:E1204 13:31:59.351000 422912 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2117498Z [rank2]:E1204 13:31:59.351000 422912 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2118030Z [rank2]:E1204 13:31:59.351000 422912 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 74240 on device 2. CUDA driver allocated memory was 2300575744 and is now 3833593856.
2025-12-04T13:38:32.2118155Z [rank2]:E1204 13:31:59.351000 422912 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2118382Z [rank2]:E1204 13:31:59.351000 422912 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2118787Z [rank2]:E1204 13:31:59.351000 422912 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2118909Z [rank2]:E1204 13:31:59.351000 422912 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2119138Z [rank2]:E1204 13:31:59.351000 422912 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2119313Z [rank2]:E1204 13:31:59.351000 422912 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.2119462Z [rank3]:E1204 13:31:59.351000 422913 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2119699Z [rank3]:E1204 13:31:59.351000 422913 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2120024Z [rank3]:E1204 13:31:59.351000 422913 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2120204Z [rank3]:E1204 13:31:59.351000 422913 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2120511Z [rank3]:E1204 13:31:59.351000 422913 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2120645Z [rank3]:E1204 13:31:59.351000 422913 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2120942Z [rank3]:E1204 13:31:59.351000 422913 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2121102Z [rank3]:E1204 13:31:59.351000 422913 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2121402Z [rank3]:E1204 13:31:59.351000 422913 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2121583Z [rank3]:E1204 13:31:59.351000 422913 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2121882Z [rank3]:E1204 13:31:59.351000 422913 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2122031Z [rank3]:E1204 13:31:59.351000 422913 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2122332Z [rank3]:E1204 13:31:59.351000 422913 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2122491Z [rank3]:E1204 13:31:59.351000 422913 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2123038Z [rank3]:E1204 13:31:59.351000 422913 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 66048 on device 3. CUDA driver allocated memory was 2250244096 and is now 3783262208.
2025-12-04T13:38:32.2123160Z [rank3]:E1204 13:31:59.351000 422913 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2123374Z [rank3]:E1204 13:31:59.351000 422913 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2123780Z [rank3]:E1204 13:31:59.351000 422913 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2123901Z [rank3]:E1204 13:31:59.351000 422913 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2124131Z [rank3]:E1204 13:31:59.351000 422913 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2124308Z [rank3]:E1204 13:31:59.351000 422913 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.2124361Z dist init r=3, world=4
2025-12-04T13:38:32.2124402Z dist init r=2, world=4
2025-12-04T13:38:32.2124551Z [rank1]:E1204 13:31:59.392000 422911 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2124734Z [rank1]:E1204 13:31:59.392000 422911 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2125044Z [rank1]:E1204 13:31:59.392000 422911 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2125210Z [rank1]:E1204 13:31:59.392000 422911 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2125522Z [rank1]:E1204 13:31:59.392000 422911 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2125656Z [rank1]:E1204 13:31:59.392000 422911 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2125957Z [rank1]:E1204 13:31:59.392000 422911 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2126138Z [rank1]:E1204 13:31:59.392000 422911 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2126441Z [rank1]:E1204 13:31:59.392000 422911 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2126601Z [rank1]:E1204 13:31:59.392000 422911 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2126902Z [rank1]:E1204 13:31:59.392000 422911 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2127050Z [rank1]:E1204 13:31:59.392000 422911 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2127352Z [rank1]:E1204 13:31:59.392000 422911 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2127523Z [rank1]:E1204 13:31:59.392000 422911 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2128052Z [rank1]:E1204 13:31:59.392000 422911 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 70144 on device 1. CUDA driver allocated memory was 2317352960 and is now 3850371072.
2025-12-04T13:38:32.2128176Z [rank1]:E1204 13:31:59.392000 422911 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2128388Z [rank1]:E1204 13:31:59.392000 422911 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2128788Z [rank1]:E1204 13:31:59.392000 422911 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2128911Z [rank1]:E1204 13:31:59.392000 422911 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2129152Z [rank1]:E1204 13:31:59.392000 422911 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2129337Z [rank1]:E1204 13:31:59.392000 422911 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.2129381Z dist init r=1, world=4
2025-12-04T13:38:32.2129527Z [rank0]:E1204 13:31:59.411000 422910 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2129793Z [rank0]:E1204 13:31:59.411000 422910 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2130106Z [rank0]:E1204 13:31:59.411000 422910 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2130274Z [rank0]:E1204 13:31:59.411000 422910 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2130580Z [rank0]:E1204 13:31:59.411000 422910 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2130730Z [rank0]:E1204 13:31:59.411000 422910 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2131031Z [rank0]:E1204 13:31:59.411000 422910 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2131190Z [rank0]:E1204 13:31:59.411000 422910 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2131494Z [rank0]:E1204 13:31:59.411000 422910 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2131653Z [rank0]:E1204 13:31:59.411000 422910 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2131953Z [rank0]:E1204 13:31:59.411000 422910 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2132112Z [rank0]:E1204 13:31:59.411000 422910 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2132411Z [rank0]:E1204 13:31:59.411000 422910 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2132573Z [rank0]:E1204 13:31:59.411000 422910 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2133097Z [rank0]:E1204 13:31:59.411000 422910 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 66048 on device 0. CUDA driver allocated memory was 2453667840 and is now 3986685952.
2025-12-04T13:38:32.2133221Z [rank0]:E1204 13:31:59.411000 422910 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2133430Z [rank0]:E1204 13:31:59.411000 422910 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2133846Z [rank0]:E1204 13:31:59.411000 422910 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2133982Z [rank0]:E1204 13:31:59.411000 422910 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2134212Z [rank0]:E1204 13:31:59.411000 422910 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2134390Z [rank0]:E1204 13:31:59.411000 422910 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.2134432Z dist init r=0, world=4
2025-12-04T13:38:32.2134796Z [rank0]:[W1204 13:31:59.679826963 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.2134837Z FAILED [9.6195s] [100%]
2025-12-04T13:38:32.2134839Z 
2025-12-04T13:38:32.2134901Z =================================== FAILURES ===================================
2025-12-04T13:38:32.2135033Z _ TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda _
2025-12-04T13:38:32.2135085Z Traceback (most recent call last):
2025-12-04T13:38:32.2135260Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.2135308Z     self._join_processes(fn)
2025-12-04T13:38:32.2135495Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.2135554Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.2135747Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.2135796Z     raise RuntimeError(error)
2025-12-04T13:38:32.2135883Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.2135932Z Traceback (most recent call last):
2025-12-04T13:38:32.2136110Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2136157Z     getattr(self, test_name)()
2025-12-04T13:38:32.2136330Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2136366Z     fn()
2025-12-04T13:38:32.2136545Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2136589Z     method(*args, **kwargs)
2025-12-04T13:38:32.2136755Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2136799Z     method(*args, **kwargs)
2025-12-04T13:38:32.2136963Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2137004Z     with policy():
2025-12-04T13:38:32.2137171Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2137215Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2137612Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 66048 on device 3. CUDA driver allocated memory was 2250244096 and is now 3783262208.
2025-12-04T13:38:32.2137615Z 
2025-12-04T13:38:32.2137694Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2137972Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2137974Z 
2025-12-04T13:38:32.2138070Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2138084Z 
2025-12-04T13:38:32.2138086Z 
2025-12-04T13:38:32.2138167Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.2138263Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.2141386Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-aa44c67e35bd4a1b.xml -
2025-12-04T13:38:32.2141458Z =========================== short test summary info ============================
2025-12-04T13:38:32.2141745Z FAILED [9.6195s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_shard_grad_op_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.2141796Z Traceback (most recent call last):
2025-12-04T13:38:32.2141982Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2142059Z     getattr(self, test_name)()
2025-12-04T13:38:32.2142234Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2142275Z     fn()
2025-12-04T13:38:32.2142441Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2142487Z     method(*args, **kwargs)
2025-12-04T13:38:32.2142651Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2142695Z     method(*args, **kwargs)
2025-12-04T13:38:32.2142859Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2142900Z     with policy():
2025-12-04T13:38:32.2143065Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2143113Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2143510Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 66048 on device 3. CUDA driver allocated memory was 2250244096 and is now 3783262208.
2025-12-04T13:38:32.2143512Z 
2025-12-04T13:38:32.2143610Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2143875Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2143881Z 
2025-12-04T13:38:32.2143975Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2144045Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.2144113Z ======================= 1 failed, 32 deselected in 9.78s =======================
2025-12-04T13:38:32.2144155Z Got exit code 1
2025-12-04T13:38:32.2144367Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2144506Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T13:38:32.2144710Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-58ee2716c244f3f6.xml
2025-12-04T13:38:32.2144775Z ============================= test session starts ==============================
2025-12-04T13:38:32.2144915Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.2144962Z cachedir: .pytest_cache
2025-12-04T13:38:32.2145132Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.2145213Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.2145256Z configfile: pytest.ini
2025-12-04T13:38:32.2145434Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.2145513Z collecting ... collected 60 items / 22 deselected / 38 selected
2025-12-04T13:38:32.2145573Z stepcurrent: skipping 22 already run items.
2025-12-04T13:38:32.2145619Z Running 11 items in this shard
2025-12-04T13:38:32.2145622Z 
2025-12-04T13:38:32.2145969Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_shard_grad_op_cuda I1204 13:32:03.794000 423243 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 423312
2025-12-04T13:38:32.2146138Z I1204 13:32:03.795000 423243 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 423313
2025-12-04T13:38:32.2146315Z I1204 13:32:03.795000 423243 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 423314
2025-12-04T13:38:32.2146479Z I1204 13:32:03.795000 423243 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 423315
2025-12-04T13:38:32.2147116Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2147158Z   _warn_cpu_init()
2025-12-04T13:38:32.2147768Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2147813Z   _warn_cpu_init()
2025-12-04T13:38:32.2148145Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.2148190Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2148805Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2148845Z   _warn_cpu_init()
2025-12-04T13:38:32.2149458Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2149510Z   _warn_cpu_init()
2025-12-04T13:38:32.2149701Z [rank3]:E1204 13:32:11.746000 423315 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2149893Z [rank3]:E1204 13:32:11.746000 423315 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2150203Z [rank3]:E1204 13:32:11.746000 423315 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2150373Z [rank3]:E1204 13:32:11.746000 423315 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2150679Z [rank3]:E1204 13:32:11.746000 423315 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2150817Z [rank3]:E1204 13:32:11.746000 423315 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2151115Z [rank3]:E1204 13:32:11.746000 423315 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2151291Z [rank3]:E1204 13:32:11.746000 423315 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2151592Z [rank3]:E1204 13:32:11.746000 423315 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2151749Z [rank3]:E1204 13:32:11.746000 423315 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2152051Z [rank3]:E1204 13:32:11.746000 423315 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2152198Z [rank3]:E1204 13:32:11.746000 423315 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2152498Z [rank3]:E1204 13:32:11.746000 423315 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2152670Z [rank3]:E1204 13:32:11.746000 423315 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2153201Z [rank3]:E1204 13:32:11.746000 423315 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 24064 on device 3. CUDA driver allocated memory was 2250244096 and is now 3760193536.
2025-12-04T13:38:32.2153328Z [rank3]:E1204 13:32:11.746000 423315 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2153538Z [rank3]:E1204 13:32:11.746000 423315 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2153944Z [rank3]:E1204 13:32:11.746000 423315 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2154067Z [rank3]:E1204 13:32:11.746000 423315 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2154309Z [rank3]:E1204 13:32:11.746000 423315 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2154486Z [rank3]:E1204 13:32:11.746000 423315 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.2154542Z dist init r=3, world=4
2025-12-04T13:38:32.2154690Z [rank1]:E1204 13:32:11.748000 423313 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2154862Z [rank1]:E1204 13:32:11.748000 423313 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2155174Z [rank1]:E1204 13:32:11.748000 423313 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2155339Z [rank1]:E1204 13:32:11.748000 423313 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2155647Z [rank1]:E1204 13:32:11.748000 423313 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2155793Z [rank1]:E1204 13:32:11.748000 423313 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2156093Z [rank1]:E1204 13:32:11.748000 423313 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2156250Z [rank1]:E1204 13:32:11.748000 423313 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2156551Z [rank1]:E1204 13:32:11.748000 423313 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2156710Z [rank1]:E1204 13:32:11.748000 423313 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2157007Z [rank1]:E1204 13:32:11.748000 423313 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2157155Z [rank1]:E1204 13:32:11.748000 423313 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2157465Z [rank1]:E1204 13:32:11.748000 423313 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2157626Z [rank1]:E1204 13:32:11.748000 423313 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2158153Z [rank1]:E1204 13:32:11.748000 423313 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 24064 on device 1. CUDA driver allocated memory was 2317352960 and is now 3827302400.
2025-12-04T13:38:32.2158277Z [rank1]:E1204 13:32:11.748000 423313 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2158489Z [rank1]:E1204 13:32:11.748000 423313 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2158897Z [rank1]:E1204 13:32:11.748000 423313 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2159020Z [rank1]:E1204 13:32:11.748000 423313 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2159260Z [rank1]:E1204 13:32:11.748000 423313 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2159438Z [rank1]:E1204 13:32:11.748000 423313 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.2159479Z dist init r=1, world=4
2025-12-04T13:38:32.2159703Z [rank2]:E1204 13:32:11.750000 423314 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2159877Z [rank2]:E1204 13:32:11.750000 423314 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2160192Z [rank2]:E1204 13:32:11.750000 423314 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2160375Z [rank2]:E1204 13:32:11.750000 423314 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2160680Z [rank2]:E1204 13:32:11.750000 423314 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2160815Z [rank2]:E1204 13:32:11.750000 423314 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2161113Z [rank2]:E1204 13:32:11.750000 423314 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2161273Z [rank2]:E1204 13:32:11.750000 423314 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2161571Z [rank2]:E1204 13:32:11.750000 423314 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2161729Z [rank2]:E1204 13:32:11.750000 423314 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2162040Z [rank2]:E1204 13:32:11.750000 423314 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2162189Z [rank2]:E1204 13:32:11.750000 423314 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2162491Z [rank2]:E1204 13:32:11.750000 423314 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2162650Z [rank2]:E1204 13:32:11.750000 423314 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2163177Z [rank2]:E1204 13:32:11.750000 423314 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 19968 on device 2. CUDA driver allocated memory was 2300575744 and is now 3810525184.
2025-12-04T13:38:32.2163301Z [rank2]:E1204 13:32:11.750000 423314 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2163535Z [rank2]:E1204 13:32:11.750000 423314 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2163933Z [rank2]:E1204 13:32:11.750000 423314 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2164071Z [rank2]:E1204 13:32:11.750000 423314 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2164299Z [rank2]:E1204 13:32:11.750000 423314 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2164474Z [rank2]:E1204 13:32:11.750000 423314 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.2164517Z dist init r=2, world=4
2025-12-04T13:38:32.2164666Z [rank0]:E1204 13:32:11.797000 423312 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2164845Z [rank0]:E1204 13:32:11.797000 423312 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2165169Z [rank0]:E1204 13:32:11.797000 423312 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2165338Z [rank0]:E1204 13:32:11.797000 423312 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2165645Z [rank0]:E1204 13:32:11.797000 423312 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2165779Z [rank0]:E1204 13:32:11.797000 423312 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2166077Z [rank0]:E1204 13:32:11.797000 423312 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2166236Z [rank0]:E1204 13:32:11.797000 423312 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2166545Z [rank0]:E1204 13:32:11.797000 423312 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2166706Z [rank0]:E1204 13:32:11.797000 423312 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2167003Z [rank0]:E1204 13:32:11.797000 423312 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2167152Z [rank0]:E1204 13:32:11.797000 423312 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2167454Z [rank0]:E1204 13:32:11.797000 423312 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2167616Z [rank0]:E1204 13:32:11.797000 423312 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2168157Z [rank0]:E1204 13:32:11.797000 423312 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 15872 on device 0. CUDA driver allocated memory was 2453667840 and is now 3963617280.
2025-12-04T13:38:32.2168292Z [rank0]:E1204 13:32:11.797000 423312 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2168503Z [rank0]:E1204 13:32:11.797000 423312 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2168901Z [rank0]:E1204 13:32:11.797000 423312 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2169024Z [rank0]:E1204 13:32:11.797000 423312 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2169251Z [rank0]:E1204 13:32:11.797000 423312 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2169430Z [rank0]:E1204 13:32:11.797000 423312 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.2169483Z dist init r=0, world=4
2025-12-04T13:38:32.2169897Z [rank0]:[W1204 13:32:12.085954180 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.2169941Z FAILED [9.8199s] [  9%]
2025-12-04T13:38:32.2169944Z 
2025-12-04T13:38:32.2170005Z =================================== FAILURES ===================================
2025-12-04T13:38:32.2170126Z _ TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda _
2025-12-04T13:38:32.2170176Z Traceback (most recent call last):
2025-12-04T13:38:32.2170356Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.2170403Z     self._join_processes(fn)
2025-12-04T13:38:32.2170591Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.2170650Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.2170846Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.2170893Z     raise RuntimeError(error)
2025-12-04T13:38:32.2171004Z RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T13:38:32.2171053Z Traceback (most recent call last):
2025-12-04T13:38:32.2171228Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2171274Z     getattr(self, test_name)()
2025-12-04T13:38:32.2171447Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2171486Z     fn()
2025-12-04T13:38:32.2171654Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2171698Z     method(*args, **kwargs)
2025-12-04T13:38:32.2171861Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2171904Z     method(*args, **kwargs)
2025-12-04T13:38:32.2172071Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2172111Z     with policy():
2025-12-04T13:38:32.2172281Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2172338Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2172733Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 19968 on device 2. CUDA driver allocated memory was 2300575744 and is now 3810525184.
2025-12-04T13:38:32.2172750Z 
2025-12-04T13:38:32.2172832Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2173095Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2173098Z 
2025-12-04T13:38:32.2173195Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2173197Z 
2025-12-04T13:38:32.2173260Z Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.2173311Z Traceback (most recent call last):
2025-12-04T13:38:32.2173488Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2173535Z     getattr(self, test_name)()
2025-12-04T13:38:32.2173723Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2173762Z     fn()
2025-12-04T13:38:32.2173926Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2173971Z     method(*args, **kwargs)
2025-12-04T13:38:32.2174134Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2174179Z     method(*args, **kwargs)
2025-12-04T13:38:32.2174342Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2174382Z     with policy():
2025-12-04T13:38:32.2174549Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2174594Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2174987Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 24064 on device 3. CUDA driver allocated memory was 2250244096 and is now 3760193536.
2025-12-04T13:38:32.2174990Z 
2025-12-04T13:38:32.2175068Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2175343Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2175345Z 
2025-12-04T13:38:32.2175439Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2175441Z 
2025-12-04T13:38:32.2175443Z 
2025-12-04T13:38:32.2175526Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.2175623Z Process 2 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.2175874Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-58ee2716c244f3f6.xml -
2025-12-04T13:38:32.2175942Z =========================== short test summary info ============================
2025-12-04T13:38:32.2176221Z FAILED [9.8199s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_shard_grad_op_cuda - RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T13:38:32.2176271Z Traceback (most recent call last):
2025-12-04T13:38:32.2176459Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2176506Z     getattr(self, test_name)()
2025-12-04T13:38:32.2176677Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2176733Z     fn()
2025-12-04T13:38:32.2176895Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2176939Z     method(*args, **kwargs)
2025-12-04T13:38:32.2177103Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2177149Z     method(*args, **kwargs)
2025-12-04T13:38:32.2177311Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2177352Z     with policy():
2025-12-04T13:38:32.2177516Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2177562Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2177955Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 19968 on device 2. CUDA driver allocated memory was 2300575744 and is now 3810525184.
2025-12-04T13:38:32.2177973Z 
2025-12-04T13:38:32.2178052Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2178315Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2178317Z 
2025-12-04T13:38:32.2178408Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2178410Z 
2025-12-04T13:38:32.2178474Z Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.2178523Z Traceback (most recent call last):
2025-12-04T13:38:32.2178702Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2178748Z     getattr(self, test_name)()
2025-12-04T13:38:32.2178921Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2178958Z     fn()
2025-12-04T13:38:32.2179122Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2179165Z     method(*args, **kwargs)
2025-12-04T13:38:32.2179347Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2179390Z     method(*args, **kwargs)
2025-12-04T13:38:32.2179556Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2179637Z     with policy():
2025-12-04T13:38:32.2179805Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2179852Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2180238Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 24064 on device 3. CUDA driver allocated memory was 2250244096 and is now 3760193536.
2025-12-04T13:38:32.2180240Z 
2025-12-04T13:38:32.2180323Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2180583Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2180585Z 
2025-12-04T13:38:32.2180697Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2180767Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.2180851Z ======================= 1 failed, 22 deselected in 9.98s =======================
2025-12-04T13:38:32.2180891Z Got exit code 1
2025-12-04T13:38:32.2180936Z Retrying single test...
2025-12-04T13:38:32.2181141Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-b3ca00c9ebbe8c37.xml
2025-12-04T13:38:32.2181206Z ============================= test session starts ==============================
2025-12-04T13:38:32.2181330Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.2181375Z cachedir: .pytest_cache
2025-12-04T13:38:32.2181546Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.2181599Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.2181643Z configfile: pytest.ini
2025-12-04T13:38:32.2181822Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.2181918Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.2182176Z stepcurrent: skipping 22 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2182225Z Running 1 items in this shard
2025-12-04T13:38:32.2182227Z 
2025-12-04T13:38:32.2182570Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_shard_grad_op_cuda I1204 13:32:16.063000 423645 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 423714
2025-12-04T13:38:32.2182740Z I1204 13:32:16.063000 423645 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 423715
2025-12-04T13:38:32.2182903Z I1204 13:32:16.064000 423645 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 423716
2025-12-04T13:38:32.2183068Z I1204 13:32:16.064000 423645 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 423717
2025-12-04T13:38:32.2183703Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2183743Z   _warn_cpu_init()
2025-12-04T13:38:32.2184063Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.2184108Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2184736Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2184774Z   _warn_cpu_init()
2025-12-04T13:38:32.2185397Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2185452Z   _warn_cpu_init()
2025-12-04T13:38:32.2186064Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2186105Z   _warn_cpu_init()
2025-12-04T13:38:32.2186258Z [rank1]:E1204 13:32:23.966000 423715 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2186436Z [rank1]:E1204 13:32:23.966000 423715 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2186749Z [rank1]:E1204 13:32:23.966000 423715 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2186930Z [rank1]:E1204 13:32:23.966000 423715 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2187241Z [rank1]:E1204 13:32:23.966000 423715 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2187375Z [rank1]:E1204 13:32:23.966000 423715 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2187675Z [rank1]:E1204 13:32:23.966000 423715 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2187836Z [rank1]:E1204 13:32:23.966000 423715 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2188133Z [rank1]:E1204 13:32:23.966000 423715 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2188302Z [rank1]:E1204 13:32:23.966000 423715 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2188602Z [rank1]:E1204 13:32:23.966000 423715 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2188751Z [rank1]:E1204 13:32:23.966000 423715 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2189052Z [rank1]:E1204 13:32:23.966000 423715 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2189216Z [rank1]:E1204 13:32:23.966000 423715 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2189784Z [rank1]:E1204 13:32:23.966000 423715 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 15872 on device 1. CUDA driver allocated memory was 2317352960 and is now 3827302400.
2025-12-04T13:38:32.2189924Z [rank1]:E1204 13:32:23.966000 423715 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2190136Z [rank1]:E1204 13:32:23.966000 423715 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2190553Z [rank1]:E1204 13:32:23.966000 423715 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2190678Z [rank1]:E1204 13:32:23.966000 423715 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2190904Z [rank1]:E1204 13:32:23.966000 423715 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2191083Z [rank1]:E1204 13:32:23.966000 423715 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.2191125Z dist init r=1, world=4
2025-12-04T13:38:32.2191275Z [rank3]:E1204 13:32:23.977000 423717 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2191460Z [rank3]:E1204 13:32:23.977000 423717 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2191773Z [rank3]:E1204 13:32:23.977000 423717 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2191939Z [rank3]:E1204 13:32:23.977000 423717 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2192250Z [rank3]:E1204 13:32:23.977000 423717 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2192389Z [rank3]:E1204 13:32:23.977000 423717 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2192687Z [rank3]:E1204 13:32:23.977000 423717 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2192864Z [rank3]:E1204 13:32:23.977000 423717 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2193162Z [rank3]:E1204 13:32:23.977000 423717 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2193325Z [rank3]:E1204 13:32:23.977000 423717 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2193623Z [rank3]:E1204 13:32:23.977000 423717 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2193778Z [rank3]:E1204 13:32:23.977000 423717 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2194085Z [rank3]:E1204 13:32:23.977000 423717 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2194246Z [rank3]:E1204 13:32:23.977000 423717 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2194794Z [rank3]:E1204 13:32:23.977000 423717 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 19968 on device 3. CUDA driver allocated memory was 2250244096 and is now 3760193536.
2025-12-04T13:38:32.2194931Z [rank3]:E1204 13:32:23.977000 423717 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2195145Z [rank3]:E1204 13:32:23.977000 423717 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2195547Z [rank3]:E1204 13:32:23.977000 423717 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2195670Z [rank3]:E1204 13:32:23.977000 423717 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2195901Z [rank3]:E1204 13:32:23.977000 423717 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2196099Z [rank3]:E1204 13:32:23.977000 423717 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.2196143Z dist init r=3, world=4
2025-12-04T13:38:32.2196292Z [rank2]:E1204 13:32:23.998000 423716 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2196468Z [rank2]:E1204 13:32:23.998000 423716 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2196777Z [rank2]:E1204 13:32:23.998000 423716 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2196945Z [rank2]:E1204 13:32:23.998000 423716 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2197257Z [rank2]:E1204 13:32:23.998000 423716 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2197392Z [rank2]:E1204 13:32:23.998000 423716 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2197705Z [rank2]:E1204 13:32:23.998000 423716 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2197865Z [rank2]:E1204 13:32:23.998000 423716 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2198164Z [rank2]:E1204 13:32:23.998000 423716 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2198323Z [rank2]:E1204 13:32:23.998000 423716 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2198622Z [rank2]:E1204 13:32:23.998000 423716 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2198768Z [rank2]:E1204 13:32:23.998000 423716 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2199088Z [rank2]:E1204 13:32:23.998000 423716 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2199250Z [rank2]:E1204 13:32:23.998000 423716 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2199827Z [rank2]:E1204 13:32:23.998000 423716 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 19968 on device 2. CUDA driver allocated memory was 2300575744 and is now 3810525184.
2025-12-04T13:38:32.2199953Z [rank2]:E1204 13:32:23.998000 423716 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2200162Z [rank2]:E1204 13:32:23.998000 423716 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2200567Z [rank2]:E1204 13:32:23.998000 423716 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2200709Z [rank2]:E1204 13:32:23.998000 423716 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2200936Z [rank2]:E1204 13:32:23.998000 423716 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2201116Z [rank2]:E1204 13:32:23.998000 423716 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.2201157Z dist init r=2, world=4
2025-12-04T13:38:32.2201307Z [rank0]:E1204 13:32:24.012000 423714 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2201480Z [rank0]:E1204 13:32:24.012000 423714 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2201791Z [rank0]:E1204 13:32:24.012000 423714 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2201958Z [rank0]:E1204 13:32:24.012000 423714 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2202281Z [rank0]:E1204 13:32:24.012000 423714 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2202418Z [rank0]:E1204 13:32:24.012000 423714 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2202717Z [rank0]:E1204 13:32:24.012000 423714 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2202878Z [rank0]:E1204 13:32:24.012000 423714 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2203175Z [rank0]:E1204 13:32:24.012000 423714 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2203334Z [rank0]:E1204 13:32:24.012000 423714 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2203648Z [rank0]:E1204 13:32:24.012000 423714 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2203794Z [rank0]:E1204 13:32:24.012000 423714 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2204111Z [rank0]:E1204 13:32:24.012000 423714 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2204271Z [rank0]:E1204 13:32:24.012000 423714 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2204800Z [rank0]:E1204 13:32:24.012000 423714 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 15872 on device 0. CUDA driver allocated memory was 2453667840 and is now 3963617280.
2025-12-04T13:38:32.2204926Z [rank0]:E1204 13:32:24.012000 423714 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2205135Z [rank0]:E1204 13:32:24.012000 423714 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2205549Z [rank0]:E1204 13:32:24.012000 423714 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2205670Z [rank0]:E1204 13:32:24.012000 423714 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2205899Z [rank0]:E1204 13:32:24.012000 423714 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2206076Z [rank0]:E1204 13:32:24.012000 423714 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.2206118Z dist init r=0, world=4
2025-12-04T13:38:32.2206482Z [rank0]:[W1204 13:32:24.276152950 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.2206525Z FAILED [9.9199s] [100%]
2025-12-04T13:38:32.2206527Z 
2025-12-04T13:38:32.2206589Z =================================== FAILURES ===================================
2025-12-04T13:38:32.2206718Z _ TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda _
2025-12-04T13:38:32.2206771Z Traceback (most recent call last):
2025-12-04T13:38:32.2206948Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.2206997Z     self._join_processes(fn)
2025-12-04T13:38:32.2207184Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.2207246Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.2207441Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.2207488Z     raise RuntimeError(error)
2025-12-04T13:38:32.2207574Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T13:38:32.2207624Z Traceback (most recent call last):
2025-12-04T13:38:32.2207805Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2207852Z     getattr(self, test_name)()
2025-12-04T13:38:32.2208034Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2208074Z     fn()
2025-12-04T13:38:32.2208237Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2208296Z     method(*args, **kwargs)
2025-12-04T13:38:32.2208460Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2208504Z     method(*args, **kwargs)
2025-12-04T13:38:32.2208667Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2208709Z     with policy():
2025-12-04T13:38:32.2208875Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2208919Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2209313Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 15872 on device 1. CUDA driver allocated memory was 2317352960 and is now 3827302400.
2025-12-04T13:38:32.2209331Z 
2025-12-04T13:38:32.2209411Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2209733Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2209736Z 
2025-12-04T13:38:32.2209830Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2209833Z 
2025-12-04T13:38:32.2209900Z Process 2 exited with error code 10 and exception:
2025-12-04T13:38:32.2209948Z Traceback (most recent call last):
2025-12-04T13:38:32.2210126Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2210172Z     getattr(self, test_name)()
2025-12-04T13:38:32.2210346Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2210386Z     fn()
2025-12-04T13:38:32.2210549Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2210592Z     method(*args, **kwargs)
2025-12-04T13:38:32.2210755Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2210800Z     method(*args, **kwargs)
2025-12-04T13:38:32.2210978Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2211020Z     with policy():
2025-12-04T13:38:32.2211184Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2211231Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2211618Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 19968 on device 2. CUDA driver allocated memory was 2300575744 and is now 3810525184.
2025-12-04T13:38:32.2211622Z 
2025-12-04T13:38:32.2211702Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2211965Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2211967Z 
2025-12-04T13:38:32.2212064Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2212066Z 
2025-12-04T13:38:32.2212068Z 
2025-12-04T13:38:32.2212165Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.2212259Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.2212518Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-b3ca00c9ebbe8c37.xml -
2025-12-04T13:38:32.2212605Z =========================== short test summary info ============================
2025-12-04T13:38:32.2212885Z FAILED [9.9199s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_shard_grad_op_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T13:38:32.2212934Z Traceback (most recent call last):
2025-12-04T13:38:32.2213114Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2213158Z     getattr(self, test_name)()
2025-12-04T13:38:32.2213334Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2213371Z     fn()
2025-12-04T13:38:32.2213537Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2213593Z     method(*args, **kwargs)
2025-12-04T13:38:32.2213758Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2213800Z     method(*args, **kwargs)
2025-12-04T13:38:32.2213965Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2214005Z     with policy():
2025-12-04T13:38:32.2214172Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2214215Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2214609Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 15872 on device 1. CUDA driver allocated memory was 2317352960 and is now 3827302400.
2025-12-04T13:38:32.2214612Z 
2025-12-04T13:38:32.2214693Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2214959Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2214961Z 
2025-12-04T13:38:32.2215067Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2215069Z 
2025-12-04T13:38:32.2215132Z Process 2 exited with error code 10 and exception:
2025-12-04T13:38:32.2215183Z Traceback (most recent call last):
2025-12-04T13:38:32.2215359Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2215406Z     getattr(self, test_name)()
2025-12-04T13:38:32.2215578Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2215618Z     fn()
2025-12-04T13:38:32.2215779Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2215823Z     method(*args, **kwargs)
2025-12-04T13:38:32.2215984Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2216029Z     method(*args, **kwargs)
2025-12-04T13:38:32.2216195Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2216234Z     with policy():
2025-12-04T13:38:32.2216412Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2216456Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2216844Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 19968 on device 2. CUDA driver allocated memory was 2300575744 and is now 3810525184.
2025-12-04T13:38:32.2216859Z 
2025-12-04T13:38:32.2216937Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2217200Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2217203Z 
2025-12-04T13:38:32.2217295Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2217366Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.2217434Z ====================== 1 failed, 32 deselected in 10.08s =======================
2025-12-04T13:38:32.2217475Z Got exit code 1
2025-12-04T13:38:32.2217519Z Retrying single test...
2025-12-04T13:38:32.2217735Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3a279b5f32a64f35.xml
2025-12-04T13:38:32.2217798Z ============================= test session starts ==============================
2025-12-04T13:38:32.2217921Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.2217967Z cachedir: .pytest_cache
2025-12-04T13:38:32.2218139Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.2218190Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.2218233Z configfile: pytest.ini
2025-12-04T13:38:32.2218413Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.2218492Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.2218750Z stepcurrent: skipping 22 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2218798Z Running 1 items in this shard
2025-12-04T13:38:32.2218800Z 
2025-12-04T13:38:32.2219157Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_shard_grad_op_cuda I1204 13:32:28.446000 424047 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 424116
2025-12-04T13:38:32.2219324Z I1204 13:32:28.447000 424047 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 424117
2025-12-04T13:38:32.2219490Z I1204 13:32:28.448000 424047 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 424118
2025-12-04T13:38:32.2219688Z I1204 13:32:28.448000 424047 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 424119
2025-12-04T13:38:32.2220321Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2220363Z   _warn_cpu_init()
2025-12-04T13:38:32.2220695Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.2220743Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2221356Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2221412Z   _warn_cpu_init()
2025-12-04T13:38:32.2222026Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2222065Z   _warn_cpu_init()
2025-12-04T13:38:32.2222682Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2222737Z   _warn_cpu_init()
2025-12-04T13:38:32.2222892Z [rank2]:E1204 13:32:36.426000 424118 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2223068Z [rank2]:E1204 13:32:36.426000 424118 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2223389Z [rank2]:E1204 13:32:36.426000 424118 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2223560Z [rank2]:E1204 13:32:36.426000 424118 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2223869Z [rank2]:E1204 13:32:36.426000 424118 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2224020Z [rank2]:E1204 13:32:36.426000 424118 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2224318Z [rank2]:E1204 13:32:36.426000 424118 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2224479Z [rank2]:E1204 13:32:36.426000 424118 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2224780Z [rank2]:E1204 13:32:36.426000 424118 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2224941Z [rank2]:E1204 13:32:36.426000 424118 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2225244Z [rank2]:E1204 13:32:36.426000 424118 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2225401Z [rank2]:E1204 13:32:36.426000 424118 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2225702Z [rank2]:E1204 13:32:36.426000 424118 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2225873Z [rank2]:E1204 13:32:36.426000 424118 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2226404Z [rank2]:E1204 13:32:36.426000 424118 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 19968 on device 2. CUDA driver allocated memory was 2300575744 and is now 3810525184.
2025-12-04T13:38:32.2226528Z [rank2]:E1204 13:32:36.426000 424118 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2226744Z [rank2]:E1204 13:32:36.426000 424118 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2227154Z [rank2]:E1204 13:32:36.426000 424118 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2227288Z [rank2]:E1204 13:32:36.426000 424118 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2227521Z [rank2]:E1204 13:32:36.426000 424118 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2227697Z [rank2]:E1204 13:32:36.426000 424118 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.2227743Z dist init r=2, world=4
2025-12-04T13:38:32.2227890Z [rank1]:E1204 13:32:36.430000 424117 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2228064Z [rank1]:E1204 13:32:36.430000 424117 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2228375Z [rank1]:E1204 13:32:36.430000 424117 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2228552Z [rank1]:E1204 13:32:36.430000 424117 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2228859Z [rank1]:E1204 13:32:36.430000 424117 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2228993Z [rank1]:E1204 13:32:36.430000 424117 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2229291Z [rank1]:E1204 13:32:36.430000 424117 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2229451Z [rank1]:E1204 13:32:36.430000 424117 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2229919Z [rank1]:E1204 13:32:36.430000 424117 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2230078Z [rank1]:E1204 13:32:36.430000 424117 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2230400Z [rank1]:E1204 13:32:36.430000 424117 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2230562Z [rank1]:E1204 13:32:36.430000 424117 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2230860Z [rank1]:E1204 13:32:36.430000 424117 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2231021Z [rank1]:E1204 13:32:36.430000 424117 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2231545Z [rank1]:E1204 13:32:36.430000 424117 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 19968 on device 1. CUDA driver allocated memory was 2317352960 and is now 3827302400.
2025-12-04T13:38:32.2231671Z [rank1]:E1204 13:32:36.430000 424117 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2231897Z [rank1]:E1204 13:32:36.430000 424117 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2232296Z [rank1]:E1204 13:32:36.430000 424117 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2232419Z [rank1]:E1204 13:32:36.430000 424117 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2232647Z [rank1]:E1204 13:32:36.430000 424117 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2232826Z [rank1]:E1204 13:32:36.430000 424117 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.2232868Z dist init r=1, world=4
2025-12-04T13:38:32.2233016Z [rank3]:E1204 13:32:36.434000 424119 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2233186Z [rank3]:E1204 13:32:36.434000 424119 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2233508Z [rank3]:E1204 13:32:36.434000 424119 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2233675Z [rank3]:E1204 13:32:36.434000 424119 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2233981Z [rank3]:E1204 13:32:36.434000 424119 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2234118Z [rank3]:E1204 13:32:36.434000 424119 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2234415Z [rank3]:E1204 13:32:36.434000 424119 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2234575Z [rank3]:E1204 13:32:36.434000 424119 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2234883Z [rank3]:E1204 13:32:36.434000 424119 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2235043Z [rank3]:E1204 13:32:36.434000 424119 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2235355Z [rank3]:E1204 13:32:36.434000 424119 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2235502Z [rank3]:E1204 13:32:36.434000 424119 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2235802Z [rank3]:E1204 13:32:36.434000 424119 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2235961Z [rank3]:E1204 13:32:36.434000 424119 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2236486Z [rank3]:E1204 13:32:36.434000 424119 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 19968 on device 3. CUDA driver allocated memory was 2250244096 and is now 3760193536.
2025-12-04T13:38:32.2236621Z [rank3]:E1204 13:32:36.434000 424119 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2236832Z [rank3]:E1204 13:32:36.434000 424119 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2237231Z [rank3]:E1204 13:32:36.434000 424119 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2237351Z [rank3]:E1204 13:32:36.434000 424119 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2237583Z [rank3]:E1204 13:32:36.434000 424119 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2237759Z [rank3]:E1204 13:32:36.434000 424119 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.2237801Z dist init r=3, world=4
2025-12-04T13:38:32.2237957Z [rank0]:E1204 13:32:36.485000 424116 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2238131Z [rank0]:E1204 13:32:36.485000 424116 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2238439Z [rank0]:E1204 13:32:36.485000 424116 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2238607Z [rank0]:E1204 13:32:36.485000 424116 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2238915Z [rank0]:E1204 13:32:36.485000 424116 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2239047Z [rank0]:E1204 13:32:36.485000 424116 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2239359Z [rank0]:E1204 13:32:36.485000 424116 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2239518Z [rank0]:E1204 13:32:36.485000 424116 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2239877Z [rank0]:E1204 13:32:36.485000 424116 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2240035Z [rank0]:E1204 13:32:36.485000 424116 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2240339Z [rank0]:E1204 13:32:36.485000 424116 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2240487Z [rank0]:E1204 13:32:36.485000 424116 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2240786Z [rank0]:E1204 13:32:36.485000 424116 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2240964Z [rank0]:E1204 13:32:36.485000 424116 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2241491Z [rank0]:E1204 13:32:36.485000 424116 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 19968 on device 0. CUDA driver allocated memory was 2453667840 and is now 3963617280.
2025-12-04T13:38:32.2241615Z [rank0]:E1204 13:32:36.485000 424116 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2241826Z [rank0]:E1204 13:32:36.485000 424116 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2242227Z [rank0]:E1204 13:32:36.485000 424116 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2242352Z [rank0]:E1204 13:32:36.485000 424116 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2242593Z [rank0]:E1204 13:32:36.485000 424116 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2242772Z [rank0]:E1204 13:32:36.485000 424116 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.2242813Z dist init r=0, world=4
2025-12-04T13:38:32.2243177Z [rank0]:[W1204 13:32:36.769961927 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.2243220Z FAILED [9.9200s] [100%]
2025-12-04T13:38:32.2243223Z 
2025-12-04T13:38:32.2243284Z =================================== FAILURES ===================================
2025-12-04T13:38:32.2243403Z _ TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda _
2025-12-04T13:38:32.2243455Z Traceback (most recent call last):
2025-12-04T13:38:32.2243630Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.2243679Z     self._join_processes(fn)
2025-12-04T13:38:32.2243878Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.2243939Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.2244134Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.2244197Z     raise RuntimeError(error)
2025-12-04T13:38:32.2244284Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T13:38:32.2244332Z Traceback (most recent call last):
2025-12-04T13:38:32.2244508Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2244554Z     getattr(self, test_name)()
2025-12-04T13:38:32.2244727Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2244764Z     fn()
2025-12-04T13:38:32.2244932Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2244976Z     method(*args, **kwargs)
2025-12-04T13:38:32.2245142Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2245197Z     method(*args, **kwargs)
2025-12-04T13:38:32.2245364Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2245403Z     with policy():
2025-12-04T13:38:32.2245570Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2245615Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2246010Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 19968 on device 1. CUDA driver allocated memory was 2317352960 and is now 3827302400.
2025-12-04T13:38:32.2246012Z 
2025-12-04T13:38:32.2246093Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2246359Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2246362Z 
2025-12-04T13:38:32.2246458Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2246460Z 
2025-12-04T13:38:32.2246522Z Process 2 exited with error code 10 and exception:
2025-12-04T13:38:32.2246573Z Traceback (most recent call last):
2025-12-04T13:38:32.2246764Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2246811Z     getattr(self, test_name)()
2025-12-04T13:38:32.2246984Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2247023Z     fn()
2025-12-04T13:38:32.2247187Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2247233Z     method(*args, **kwargs)
2025-12-04T13:38:32.2247396Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2247440Z     method(*args, **kwargs)
2025-12-04T13:38:32.2247602Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2247644Z     with policy():
2025-12-04T13:38:32.2247809Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2247854Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2248258Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 19968 on device 2. CUDA driver allocated memory was 2300575744 and is now 3810525184.
2025-12-04T13:38:32.2248274Z 
2025-12-04T13:38:32.2248353Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2248616Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2248619Z 
2025-12-04T13:38:32.2248712Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2248714Z 
2025-12-04T13:38:32.2248716Z 
2025-12-04T13:38:32.2248799Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.2248891Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.2249149Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3a279b5f32a64f35.xml -
2025-12-04T13:38:32.2249214Z =========================== short test summary info ============================
2025-12-04T13:38:32.2249510Z FAILED [9.9200s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_shard_grad_op_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T13:38:32.2249560Z Traceback (most recent call last):
2025-12-04T13:38:32.2249769Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2249816Z     getattr(self, test_name)()
2025-12-04T13:38:32.2249989Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2250028Z     fn()
2025-12-04T13:38:32.2250194Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2250238Z     method(*args, **kwargs)
2025-12-04T13:38:32.2250404Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2250449Z     method(*args, **kwargs)
2025-12-04T13:38:32.2250611Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2250652Z     with policy():
2025-12-04T13:38:32.2250833Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2250879Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2251269Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 19968 on device 1. CUDA driver allocated memory was 2317352960 and is now 3827302400.
2025-12-04T13:38:32.2251271Z 
2025-12-04T13:38:32.2251350Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2251614Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2251618Z 
2025-12-04T13:38:32.2251710Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2251712Z 
2025-12-04T13:38:32.2251776Z Process 2 exited with error code 10 and exception:
2025-12-04T13:38:32.2251825Z Traceback (most recent call last):
2025-12-04T13:38:32.2252002Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2252046Z     getattr(self, test_name)()
2025-12-04T13:38:32.2252234Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2252271Z     fn()
2025-12-04T13:38:32.2252450Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2252494Z     method(*args, **kwargs)
2025-12-04T13:38:32.2252659Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2252700Z     method(*args, **kwargs)
2025-12-04T13:38:32.2252867Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2252906Z     with policy():
2025-12-04T13:38:32.2253071Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2253114Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2253506Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 19968 on device 2. CUDA driver allocated memory was 2300575744 and is now 3810525184.
2025-12-04T13:38:32.2253523Z 
2025-12-04T13:38:32.2253603Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2253863Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2253865Z 
2025-12-04T13:38:32.2253960Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2254029Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.2254098Z ====================== 1 failed, 32 deselected in 10.08s =======================
2025-12-04T13:38:32.2254138Z Got exit code 1
2025-12-04T13:38:32.2254349Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2254487Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T13:38:32.2254693Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6bd2a6a86159d468.xml
2025-12-04T13:38:32.2254756Z ============================= test session starts ==============================
2025-12-04T13:38:32.2254890Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.2254934Z cachedir: .pytest_cache
2025-12-04T13:38:32.2255108Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.2255157Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.2255203Z configfile: pytest.ini
2025-12-04T13:38:32.2255379Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.2255461Z collecting ... collected 60 items / 23 deselected / 37 selected
2025-12-04T13:38:32.2255519Z stepcurrent: skipping 23 already run items.
2025-12-04T13:38:32.2255567Z Running 10 items in this shard
2025-12-04T13:38:32.2255569Z 
2025-12-04T13:38:32.2255915Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_shard_grad_op_cuda I1204 13:32:40.792000 424449 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 424518
2025-12-04T13:38:32.2256081Z I1204 13:32:40.792000 424449 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 424519
2025-12-04T13:38:32.2256257Z I1204 13:32:40.793000 424449 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 424520
2025-12-04T13:38:32.2256419Z I1204 13:32:40.793000 424449 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 424521
2025-12-04T13:38:32.2257057Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2257099Z   _warn_cpu_init()
2025-12-04T13:38:32.2257715Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2257770Z   _warn_cpu_init()
2025-12-04T13:38:32.2258383Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2258425Z   _warn_cpu_init()
2025-12-04T13:38:32.2259030Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2259073Z   _warn_cpu_init()
2025-12-04T13:38:32.2259387Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.2259432Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2259638Z [rank1]:E1204 13:32:48.448000 424519 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2259812Z [rank1]:E1204 13:32:48.448000 424519 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2260125Z [rank1]:E1204 13:32:48.448000 424519 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2260293Z [rank1]:E1204 13:32:48.448000 424519 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2260604Z [rank1]:E1204 13:32:48.448000 424519 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2260740Z [rank1]:E1204 13:32:48.448000 424519 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2261053Z [rank1]:E1204 13:32:48.448000 424519 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2261214Z [rank1]:E1204 13:32:48.448000 424519 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2261533Z [rank1]:E1204 13:32:48.448000 424519 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2261693Z [rank1]:E1204 13:32:48.448000 424519 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2261992Z [rank1]:E1204 13:32:48.448000 424519 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2262141Z [rank1]:E1204 13:32:48.448000 424519 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2262439Z [rank1]:E1204 13:32:48.448000 424519 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2262625Z [rank1]:E1204 13:32:48.448000 424519 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2263154Z [rank1]:E1204 13:32:48.448000 424519 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 53760 on device 1. CUDA driver allocated memory was 2317352960 and is now 3850371072.
2025-12-04T13:38:32.2263277Z [rank1]:E1204 13:32:48.448000 424519 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2263491Z [rank1]:E1204 13:32:48.448000 424519 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2263889Z [rank1]:E1204 13:32:48.448000 424519 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2264014Z [rank1]:E1204 13:32:48.448000 424519 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2264256Z [rank1]:E1204 13:32:48.448000 424519 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2264431Z [rank1]:E1204 13:32:48.448000 424519 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.2264474Z dist init r=1, world=4
2025-12-04T13:38:32.2264620Z [rank2]:E1204 13:32:48.454000 424520 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2264794Z [rank2]:E1204 13:32:48.454000 424520 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2265103Z [rank2]:E1204 13:32:48.454000 424520 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2265272Z [rank2]:E1204 13:32:48.454000 424520 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2265581Z [rank2]:E1204 13:32:48.454000 424520 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2265726Z [rank2]:E1204 13:32:48.454000 424520 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2266026Z [rank2]:E1204 13:32:48.454000 424520 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2266196Z [rank2]:E1204 13:32:48.454000 424520 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2266494Z [rank2]:E1204 13:32:48.454000 424520 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2266652Z [rank2]:E1204 13:32:48.454000 424520 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2266950Z [rank2]:E1204 13:32:48.454000 424520 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2267097Z [rank2]:E1204 13:32:48.454000 424520 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2267409Z [rank2]:E1204 13:32:48.454000 424520 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2267570Z [rank2]:E1204 13:32:48.454000 424520 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2268093Z [rank2]:E1204 13:32:48.454000 424520 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 2. CUDA driver allocated memory was 2300575744 and is now 3833593856.
2025-12-04T13:38:32.2268217Z [rank2]:E1204 13:32:48.454000 424520 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2268427Z [rank2]:E1204 13:32:48.454000 424520 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2268834Z [rank2]:E1204 13:32:48.454000 424520 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2268956Z [rank2]:E1204 13:32:48.454000 424520 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2269186Z [rank2]:E1204 13:32:48.454000 424520 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2269363Z [rank2]:E1204 13:32:48.454000 424520 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.2269405Z dist init r=2, world=4
2025-12-04T13:38:32.2269556Z [rank3]:E1204 13:32:48.501000 424521 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2269743Z [rank3]:E1204 13:32:48.501000 424521 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2270054Z [rank3]:E1204 13:32:48.501000 424521 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2270233Z [rank3]:E1204 13:32:48.501000 424521 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2270541Z [rank3]:E1204 13:32:48.501000 424521 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2270691Z [rank3]:E1204 13:32:48.501000 424521 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2270989Z [rank3]:E1204 13:32:48.501000 424521 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2271150Z [rank3]:E1204 13:32:48.501000 424521 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2271447Z [rank3]:E1204 13:32:48.501000 424521 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2271606Z [rank3]:E1204 13:32:48.501000 424521 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2271917Z [rank3]:E1204 13:32:48.501000 424521 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2272065Z [rank3]:E1204 13:32:48.501000 424521 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2272365Z [rank3]:E1204 13:32:48.501000 424521 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2272526Z [rank3]:E1204 13:32:48.501000 424521 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2273052Z [rank3]:E1204 13:32:48.501000 424521 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 2250244096 and is now 3783262208.
2025-12-04T13:38:32.2273175Z [rank3]:E1204 13:32:48.501000 424521 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2273401Z [rank3]:E1204 13:32:48.501000 424521 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2273796Z [rank3]:E1204 13:32:48.501000 424521 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2273917Z [rank3]:E1204 13:32:48.501000 424521 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2274145Z [rank3]:E1204 13:32:48.501000 424521 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2274321Z [rank3]:E1204 13:32:48.501000 424521 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.2274363Z dist init r=3, world=4
2025-12-04T13:38:32.2274510Z [rank0]:E1204 13:32:48.509000 424518 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2274681Z [rank0]:E1204 13:32:48.509000 424518 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2275000Z [rank0]:E1204 13:32:48.509000 424518 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2275179Z [rank0]:E1204 13:32:48.509000 424518 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2275490Z [rank0]:E1204 13:32:48.509000 424518 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2275623Z [rank0]:E1204 13:32:48.509000 424518 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2275920Z [rank0]:E1204 13:32:48.509000 424518 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2276081Z [rank0]:E1204 13:32:48.509000 424518 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2276380Z [rank0]:E1204 13:32:48.509000 424518 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2276548Z [rank0]:E1204 13:32:48.509000 424518 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2276847Z [rank0]:E1204 13:32:48.509000 424518 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2276992Z [rank0]:E1204 13:32:48.509000 424518 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2277292Z [rank0]:E1204 13:32:48.509000 424518 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2277450Z [rank0]:E1204 13:32:48.509000 424518 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2277988Z [rank0]:E1204 13:32:48.509000 424518 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 51712 on device 0. CUDA driver allocated memory was 2453667840 and is now 3986685952.
2025-12-04T13:38:32.2278111Z [rank0]:E1204 13:32:48.509000 424518 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2278321Z [rank0]:E1204 13:32:48.509000 424518 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2278719Z [rank0]:E1204 13:32:48.509000 424518 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2278841Z [rank0]:E1204 13:32:48.509000 424518 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2279071Z [rank0]:E1204 13:32:48.509000 424518 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2279245Z [rank0]:E1204 13:32:48.509000 424518 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.2279287Z dist init r=0, world=4
2025-12-04T13:38:32.2279710Z [rank0]:[W1204 13:32:48.836988417 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.2279773Z FAILED [9.6197s] [ 10%]
2025-12-04T13:38:32.2279775Z 
2025-12-04T13:38:32.2279836Z =================================== FAILURES ===================================
2025-12-04T13:38:32.2279951Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda _
2025-12-04T13:38:32.2280001Z Traceback (most recent call last):
2025-12-04T13:38:32.2280177Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.2280225Z     self._join_processes(fn)
2025-12-04T13:38:32.2280412Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.2280471Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.2280664Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.2280727Z     raise RuntimeError(error)
2025-12-04T13:38:32.2280811Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T13:38:32.2280860Z Traceback (most recent call last):
2025-12-04T13:38:32.2281033Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2281079Z     getattr(self, test_name)()
2025-12-04T13:38:32.2281251Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2281288Z     fn()
2025-12-04T13:38:32.2281453Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2281498Z     method(*args, **kwargs)
2025-12-04T13:38:32.2281661Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2281706Z     method(*args, **kwargs)
2025-12-04T13:38:32.2281871Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2281910Z     with policy():
2025-12-04T13:38:32.2282076Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2282119Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2282523Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 53760 on device 1. CUDA driver allocated memory was 2317352960 and is now 3850371072.
2025-12-04T13:38:32.2282526Z 
2025-12-04T13:38:32.2282606Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2282869Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2282874Z 
2025-12-04T13:38:32.2282967Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2282969Z 
2025-12-04T13:38:32.2282971Z 
2025-12-04T13:38:32.2283052Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.2283148Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.2283398Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6bd2a6a86159d468.xml -
2025-12-04T13:38:32.2283474Z =========================== short test summary info ============================
2025-12-04T13:38:32.2283750Z FAILED [9.6197s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_shard_grad_op_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T13:38:32.2283813Z Traceback (most recent call last):
2025-12-04T13:38:32.2283988Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2284034Z     getattr(self, test_name)()
2025-12-04T13:38:32.2284206Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2284245Z     fn()
2025-12-04T13:38:32.2284406Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2284450Z     method(*args, **kwargs)
2025-12-04T13:38:32.2284614Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2284656Z     method(*args, **kwargs)
2025-12-04T13:38:32.2284818Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2284871Z     with policy():
2025-12-04T13:38:32.2285036Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2285081Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2285476Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 53760 on device 1. CUDA driver allocated memory was 2317352960 and is now 3850371072.
2025-12-04T13:38:32.2285479Z 
2025-12-04T13:38:32.2285557Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2285819Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2285822Z 
2025-12-04T13:38:32.2285914Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2285984Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.2286049Z ======================= 1 failed, 23 deselected in 9.78s =======================
2025-12-04T13:38:32.2286089Z Got exit code 1
2025-12-04T13:38:32.2286130Z Retrying single test...
2025-12-04T13:38:32.2286345Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-d1354b33471f0cd9.xml
2025-12-04T13:38:32.2286407Z ============================= test session starts ==============================
2025-12-04T13:38:32.2286532Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.2286575Z cachedir: .pytest_cache
2025-12-04T13:38:32.2286746Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.2286796Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.2286840Z configfile: pytest.ini
2025-12-04T13:38:32.2287013Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.2287092Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.2287346Z stepcurrent: skipping 23 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2287393Z Running 1 items in this shard
2025-12-04T13:38:32.2287395Z 
2025-12-04T13:38:32.2287750Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_shard_grad_op_cuda I1204 13:32:52.870000 424851 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 424920
2025-12-04T13:38:32.2287929Z I1204 13:32:52.870000 424851 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 424921
2025-12-04T13:38:32.2288094Z I1204 13:32:52.871000 424851 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 424922
2025-12-04T13:38:32.2288255Z I1204 13:32:52.871000 424851 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 424923
2025-12-04T13:38:32.2288883Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2288924Z   _warn_cpu_init()
2025-12-04T13:38:32.2289540Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2289648Z   _warn_cpu_init()
2025-12-04T13:38:32.2290262Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2290302Z   _warn_cpu_init()
2025-12-04T13:38:32.2290932Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2290971Z   _warn_cpu_init()
2025-12-04T13:38:32.2291285Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.2291330Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2291485Z [rank3]:E1204 13:33:00.498000 424923 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2291659Z [rank3]:E1204 13:33:00.498000 424923 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2291973Z [rank3]:E1204 13:33:00.498000 424923 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2292139Z [rank3]:E1204 13:33:00.498000 424923 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2292461Z [rank3]:E1204 13:33:00.498000 424923 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2292597Z [rank3]:E1204 13:33:00.498000 424923 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2292909Z [rank3]:E1204 13:33:00.498000 424923 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2293070Z [rank3]:E1204 13:33:00.498000 424923 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2293368Z [rank3]:E1204 13:33:00.498000 424923 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2293528Z [rank3]:E1204 13:33:00.498000 424923 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2293825Z [rank3]:E1204 13:33:00.498000 424923 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2293987Z [rank3]:E1204 13:33:00.498000 424923 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2294288Z [rank3]:E1204 13:33:00.498000 424923 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2294448Z [rank3]:E1204 13:33:00.498000 424923 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2294976Z [rank3]:E1204 13:33:00.498000 424923 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 53760 on device 3. CUDA driver allocated memory was 2250244096 and is now 3783262208.
2025-12-04T13:38:32.2295101Z [rank3]:E1204 13:33:00.498000 424923 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2295311Z [rank3]:E1204 13:33:00.498000 424923 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2295723Z [rank3]:E1204 13:33:00.498000 424923 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2295847Z [rank3]:E1204 13:33:00.498000 424923 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2296078Z [rank3]:E1204 13:33:00.498000 424923 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2296257Z [rank3]:E1204 13:33:00.498000 424923 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.2296301Z dist init r=3, world=4
2025-12-04T13:38:32.2296448Z [rank2]:E1204 13:33:00.551000 424922 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2296623Z [rank2]:E1204 13:33:00.551000 424922 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2296932Z [rank2]:E1204 13:33:00.551000 424922 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2297119Z [rank2]:E1204 13:33:00.551000 424922 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2297426Z [rank2]:E1204 13:33:00.551000 424922 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2297575Z [rank2]:E1204 13:33:00.551000 424922 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2297877Z [rank2]:E1204 13:33:00.551000 424922 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2298036Z [rank2]:E1204 13:33:00.551000 424922 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2298337Z [rank2]:E1204 13:33:00.551000 424922 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2298496Z [rank2]:E1204 13:33:00.551000 424922 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2298807Z [rank2]:E1204 13:33:00.551000 424922 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2298953Z [rank2]:E1204 13:33:00.551000 424922 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2299252Z [rank2]:E1204 13:33:00.551000 424922 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2299413Z [rank2]:E1204 13:33:00.551000 424922 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2299982Z [rank2]:E1204 13:33:00.551000 424922 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 66048 on device 2. CUDA driver allocated memory was 2300575744 and is now 3833593856.
2025-12-04T13:38:32.2300107Z [rank2]:E1204 13:33:00.551000 424922 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2300338Z [rank2]:E1204 13:33:00.551000 424922 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2300743Z [rank2]:E1204 13:33:00.551000 424922 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2300865Z [rank2]:E1204 13:33:00.551000 424922 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2301094Z [rank2]:E1204 13:33:00.551000 424922 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2301272Z [rank2]:E1204 13:33:00.551000 424922 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.2301314Z dist init r=2, world=4
2025-12-04T13:38:32.2301462Z [rank1]:E1204 13:33:00.571000 424921 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2301653Z [rank1]:E1204 13:33:00.571000 424921 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2301963Z [rank1]:E1204 13:33:00.571000 424921 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2302147Z [rank1]:E1204 13:33:00.571000 424921 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2302455Z [rank1]:E1204 13:33:00.571000 424921 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2302587Z [rank1]:E1204 13:33:00.571000 424921 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2302887Z [rank1]:E1204 13:33:00.571000 424921 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2303048Z [rank1]:E1204 13:33:00.571000 424921 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2303365Z [rank1]:E1204 13:33:00.571000 424921 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2303524Z [rank1]:E1204 13:33:00.571000 424921 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2303823Z [rank1]:E1204 13:33:00.571000 424921 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2303970Z [rank1]:E1204 13:33:00.571000 424921 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2304269Z [rank1]:E1204 13:33:00.571000 424921 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2304430Z [rank1]:E1204 13:33:00.571000 424921 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2304965Z [rank1]:E1204 13:33:00.571000 424921 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 1. CUDA driver allocated memory was 2317352960 and is now 3850371072.
2025-12-04T13:38:32.2305086Z [rank1]:E1204 13:33:00.571000 424921 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2305297Z [rank1]:E1204 13:33:00.571000 424921 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2305699Z [rank1]:E1204 13:33:00.571000 424921 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2305822Z [rank1]:E1204 13:33:00.571000 424921 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2306050Z [rank1]:E1204 13:33:00.571000 424921 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2306225Z [rank1]:E1204 13:33:00.571000 424921 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.2306278Z dist init r=1, world=4
2025-12-04T13:38:32.2306424Z [rank0]:E1204 13:33:00.573000 424920 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2306606Z [rank0]:E1204 13:33:00.573000 424920 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2306916Z [rank0]:E1204 13:33:00.573000 424920 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2307082Z [rank0]:E1204 13:33:00.573000 424920 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2307388Z [rank0]:E1204 13:33:00.573000 424920 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2307522Z [rank0]:E1204 13:33:00.573000 424920 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2307822Z [rank0]:E1204 13:33:00.573000 424920 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2307992Z [rank0]:E1204 13:33:00.573000 424920 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2308292Z [rank0]:E1204 13:33:00.573000 424920 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2308449Z [rank0]:E1204 13:33:00.573000 424920 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2308746Z [rank0]:E1204 13:33:00.573000 424920 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2308892Z [rank0]:E1204 13:33:00.573000 424920 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2309190Z [rank0]:E1204 13:33:00.573000 424920 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2309359Z [rank0]:E1204 13:33:00.573000 424920 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2309933Z [rank0]:E1204 13:33:00.573000 424920 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 57856 on device 0. CUDA driver allocated memory was 2453667840 and is now 3986685952.
2025-12-04T13:38:32.2310057Z [rank0]:E1204 13:33:00.573000 424920 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2310266Z [rank0]:E1204 13:33:00.573000 424920 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2310663Z [rank0]:E1204 13:33:00.573000 424920 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2310783Z [rank0]:E1204 13:33:00.573000 424920 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2311033Z [rank0]:E1204 13:33:00.573000 424920 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2311227Z [rank0]:E1204 13:33:00.573000 424920 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.2311268Z dist init r=0, world=4
2025-12-04T13:38:32.2311632Z [rank0]:[W1204 13:33:00.970686317 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.2311674Z FAILED [9.5197s] [100%]
2025-12-04T13:38:32.2311676Z 
2025-12-04T13:38:32.2311736Z =================================== FAILURES ===================================
2025-12-04T13:38:32.2311850Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda _
2025-12-04T13:38:32.2311901Z Traceback (most recent call last):
2025-12-04T13:38:32.2312076Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.2312125Z     self._join_processes(fn)
2025-12-04T13:38:32.2312325Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.2312383Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.2312574Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.2312622Z     raise RuntimeError(error)
2025-12-04T13:38:32.2312707Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.2312755Z Traceback (most recent call last):
2025-12-04T13:38:32.2312929Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2312975Z     getattr(self, test_name)()
2025-12-04T13:38:32.2313147Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2313186Z     fn()
2025-12-04T13:38:32.2313351Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2313395Z     method(*args, **kwargs)
2025-12-04T13:38:32.2313559Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2313601Z     method(*args, **kwargs)
2025-12-04T13:38:32.2313791Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2313830Z     with policy():
2025-12-04T13:38:32.2313997Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2314040Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2314430Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 53760 on device 3. CUDA driver allocated memory was 2250244096 and is now 3783262208.
2025-12-04T13:38:32.2314435Z 
2025-12-04T13:38:32.2314514Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2314776Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2314779Z 
2025-12-04T13:38:32.2314872Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2314876Z 
2025-12-04T13:38:32.2314878Z 
2025-12-04T13:38:32.2314968Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.2315063Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.2315312Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-d1354b33471f0cd9.xml -
2025-12-04T13:38:32.2315391Z =========================== short test summary info ============================
2025-12-04T13:38:32.2315673Z FAILED [9.5197s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_shard_grad_op_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.2315723Z Traceback (most recent call last):
2025-12-04T13:38:32.2315900Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2315947Z     getattr(self, test_name)()
2025-12-04T13:38:32.2316119Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2316156Z     fn()
2025-12-04T13:38:32.2316321Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2316376Z     method(*args, **kwargs)
2025-12-04T13:38:32.2316537Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2316581Z     method(*args, **kwargs)
2025-12-04T13:38:32.2316743Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2316784Z     with policy():
2025-12-04T13:38:32.2316946Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2316989Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2317378Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 53760 on device 3. CUDA driver allocated memory was 2250244096 and is now 3783262208.
2025-12-04T13:38:32.2317383Z 
2025-12-04T13:38:32.2317460Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2317721Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2317723Z 
2025-12-04T13:38:32.2317826Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2317898Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.2317963Z ======================= 1 failed, 32 deselected in 9.68s =======================
2025-12-04T13:38:32.2318003Z Got exit code 1
2025-12-04T13:38:32.2318045Z Retrying single test...
2025-12-04T13:38:32.2318249Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-611c1cc009937671.xml
2025-12-04T13:38:32.2318312Z ============================= test session starts ==============================
2025-12-04T13:38:32.2318434Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.2318476Z cachedir: .pytest_cache
2025-12-04T13:38:32.2318649Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.2318698Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.2318742Z configfile: pytest.ini
2025-12-04T13:38:32.2318917Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.2319007Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.2319261Z stepcurrent: skipping 23 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2319319Z Running 1 items in this shard
2025-12-04T13:38:32.2319322Z 
2025-12-04T13:38:32.2319701Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_shard_grad_op_cuda I1204 13:33:05.115000 425253 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 425322
2025-12-04T13:38:32.2319871Z I1204 13:33:05.116000 425253 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 425323
2025-12-04T13:38:32.2320038Z I1204 13:33:05.116000 425253 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 425324
2025-12-04T13:38:32.2320201Z I1204 13:33:05.117000 425253 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 425325
2025-12-04T13:38:32.2320832Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2320893Z   _warn_cpu_init()
2025-12-04T13:38:32.2321513Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2321557Z   _warn_cpu_init()
2025-12-04T13:38:32.2322170Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2322215Z   _warn_cpu_init()
2025-12-04T13:38:32.2322844Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2322886Z   _warn_cpu_init()
2025-12-04T13:38:32.2323207Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.2323254Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2323411Z [rank0]:E1204 13:33:12.748000 425322 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2323587Z [rank0]:E1204 13:33:12.748000 425322 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2323920Z [rank0]:E1204 13:33:12.748000 425322 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2324089Z [rank0]:E1204 13:33:12.748000 425322 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2324412Z [rank0]:E1204 13:33:12.748000 425322 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2324549Z [rank0]:E1204 13:33:12.748000 425322 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2324848Z [rank0]:E1204 13:33:12.748000 425322 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2325012Z [rank0]:E1204 13:33:12.748000 425322 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2325312Z [rank0]:E1204 13:33:12.748000 425322 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2325487Z [rank0]:E1204 13:33:12.748000 425322 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2325787Z [rank0]:E1204 13:33:12.748000 425322 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2325941Z [rank0]:E1204 13:33:12.748000 425322 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2326247Z [rank0]:E1204 13:33:12.748000 425322 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2326407Z [rank0]:E1204 13:33:12.748000 425322 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2326939Z [rank0]:E1204 13:33:12.748000 425322 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 57856 on device 0. CUDA driver allocated memory was 2453667840 and is now 3986685952.
2025-12-04T13:38:32.2327076Z [rank0]:E1204 13:33:12.748000 425322 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2327291Z [rank0]:E1204 13:33:12.748000 425322 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2327691Z [rank0]:E1204 13:33:12.748000 425322 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2327815Z [rank0]:E1204 13:33:12.748000 425322 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2328046Z [rank0]:E1204 13:33:12.748000 425322 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2328223Z [rank0]:E1204 13:33:12.748000 425322 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.2328268Z dist init r=0, world=4
2025-12-04T13:38:32.2328417Z [rank1]:E1204 13:33:12.775000 425323 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2328604Z [rank1]:E1204 13:33:12.775000 425323 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2328912Z [rank1]:E1204 13:33:12.775000 425323 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2329095Z [rank1]:E1204 13:33:12.775000 425323 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2329405Z [rank1]:E1204 13:33:12.775000 425323 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2329539Z [rank1]:E1204 13:33:12.775000 425323 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2329896Z [rank1]:E1204 13:33:12.775000 425323 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2330057Z [rank1]:E1204 13:33:12.775000 425323 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2330376Z [rank1]:E1204 13:33:12.775000 425323 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2330535Z [rank1]:E1204 13:33:12.775000 425323 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2330842Z [rank1]:E1204 13:33:12.775000 425323 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2330989Z [rank1]:E1204 13:33:12.775000 425323 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2331291Z [rank1]:E1204 13:33:12.775000 425323 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2331455Z [rank1]:E1204 13:33:12.775000 425323 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2331988Z [rank1]:E1204 13:33:12.775000 425323 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 66048 on device 1. CUDA driver allocated memory was 2317352960 and is now 3850371072.
2025-12-04T13:38:32.2332118Z [rank1]:E1204 13:33:12.775000 425323 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2332330Z [rank1]:E1204 13:33:12.775000 425323 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2332733Z [rank1]:E1204 13:33:12.775000 425323 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2332860Z [rank1]:E1204 13:33:12.775000 425323 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2333089Z [rank1]:E1204 13:33:12.775000 425323 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2333283Z [rank1]:E1204 13:33:12.775000 425323 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.2333325Z dist init r=1, world=4
2025-12-04T13:38:32.2333493Z [rank2]:E1204 13:33:12.809000 425324 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2333666Z [rank2]:E1204 13:33:12.809000 425324 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2333977Z [rank2]:E1204 13:33:12.809000 425324 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2334143Z [rank2]:E1204 13:33:12.809000 425324 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2334453Z [rank2]:E1204 13:33:12.809000 425324 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2334592Z [rank2]:E1204 13:33:12.809000 425324 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2334902Z [rank2]:E1204 13:33:12.809000 425324 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2335064Z [rank2]:E1204 13:33:12.809000 425324 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2335363Z [rank2]:E1204 13:33:12.809000 425324 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2335526Z [rank2]:E1204 13:33:12.809000 425324 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2335823Z [rank2]:E1204 13:33:12.809000 425324 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2335975Z [rank2]:E1204 13:33:12.809000 425324 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2336290Z [rank2]:E1204 13:33:12.809000 425324 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2336451Z [rank2]:E1204 13:33:12.809000 425324 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2336975Z [rank2]:E1204 13:33:12.809000 425324 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 2. CUDA driver allocated memory was 2300575744 and is now 3833593856.
2025-12-04T13:38:32.2337100Z [rank2]:E1204 13:33:12.809000 425324 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2337315Z [rank2]:E1204 13:33:12.809000 425324 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2337713Z [rank2]:E1204 13:33:12.809000 425324 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2337851Z [rank2]:E1204 13:33:12.809000 425324 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2338083Z [rank2]:E1204 13:33:12.809000 425324 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2338272Z [rank2]:E1204 13:33:12.809000 425324 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.2338317Z dist init r=2, world=4
2025-12-04T13:38:32.2338465Z [rank3]:E1204 13:33:12.839000 425325 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2338640Z [rank3]:E1204 13:33:12.839000 425325 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2338950Z [rank3]:E1204 13:33:12.839000 425325 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2339120Z [rank3]:E1204 13:33:12.839000 425325 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2339428Z [rank3]:E1204 13:33:12.839000 425325 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2339611Z [rank3]:E1204 13:33:12.839000 425325 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2339915Z [rank3]:E1204 13:33:12.839000 425325 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2340076Z [rank3]:E1204 13:33:12.839000 425325 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2340376Z [rank3]:E1204 13:33:12.839000 425325 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2340537Z [rank3]:E1204 13:33:12.839000 425325 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2340838Z [rank3]:E1204 13:33:12.839000 425325 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2341000Z [rank3]:E1204 13:33:12.839000 425325 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2341302Z [rank3]:E1204 13:33:12.839000 425325 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2341465Z [rank3]:E1204 13:33:12.839000 425325 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2341989Z [rank3]:E1204 13:33:12.839000 425325 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 3. CUDA driver allocated memory was 2250244096 and is now 3783262208.
2025-12-04T13:38:32.2342114Z [rank3]:E1204 13:33:12.839000 425325 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2342324Z [rank3]:E1204 13:33:12.839000 425325 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2342736Z [rank3]:E1204 13:33:12.839000 425325 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2342872Z [rank3]:E1204 13:33:12.839000 425325 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2343103Z [rank3]:E1204 13:33:12.839000 425325 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2343283Z [rank3]:E1204 13:33:12.839000 425325 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.2343325Z dist init r=3, world=4
2025-12-04T13:38:32.2343692Z [rank0]:[W1204 13:33:12.910619684 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.2343735Z FAILED [9.4220s] [100%]
2025-12-04T13:38:32.2343738Z 
2025-12-04T13:38:32.2343803Z =================================== FAILURES ===================================
2025-12-04T13:38:32.2343935Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda _
2025-12-04T13:38:32.2343989Z Traceback (most recent call last):
2025-12-04T13:38:32.2344166Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.2344217Z     self._join_processes(fn)
2025-12-04T13:38:32.2344405Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.2344467Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.2344661Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.2344712Z     raise RuntimeError(error)
2025-12-04T13:38:32.2344797Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.2344850Z Traceback (most recent call last):
2025-12-04T13:38:32.2345024Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2345073Z     getattr(self, test_name)()
2025-12-04T13:38:32.2345246Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2345295Z     fn()
2025-12-04T13:38:32.2345463Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2345508Z     method(*args, **kwargs)
2025-12-04T13:38:32.2345676Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2345720Z     method(*args, **kwargs)
2025-12-04T13:38:32.2345886Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2345930Z     with policy():
2025-12-04T13:38:32.2346099Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2346143Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2346535Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 57856 on device 0. CUDA driver allocated memory was 2453667840 and is now 3986685952.
2025-12-04T13:38:32.2346538Z 
2025-12-04T13:38:32.2346621Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2346899Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2346919Z 
2025-12-04T13:38:32.2347013Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2347019Z 
2025-12-04T13:38:32.2347021Z 
2025-12-04T13:38:32.2347102Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.2347199Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.2347449Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-611c1cc009937671.xml -
2025-12-04T13:38:32.2347518Z =========================== short test summary info ============================
2025-12-04T13:38:32.2347797Z FAILED [9.4220s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_shard_grad_op_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.2347849Z Traceback (most recent call last):
2025-12-04T13:38:32.2348028Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2348090Z     getattr(self, test_name)()
2025-12-04T13:38:32.2348262Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2348303Z     fn()
2025-12-04T13:38:32.2348469Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2348516Z     method(*args, **kwargs)
2025-12-04T13:38:32.2348683Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2348728Z     method(*args, **kwargs)
2025-12-04T13:38:32.2348891Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2348935Z     with policy():
2025-12-04T13:38:32.2349103Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2349149Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2349547Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 57856 on device 0. CUDA driver allocated memory was 2453667840 and is now 3986685952.
2025-12-04T13:38:32.2349550Z 
2025-12-04T13:38:32.2349675Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2349943Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2349946Z 
2025-12-04T13:38:32.2350040Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2350113Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.2350180Z ======================= 1 failed, 32 deselected in 9.58s =======================
2025-12-04T13:38:32.2350223Z Got exit code 1
2025-12-04T13:38:32.2350431Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2350573Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T13:38:32.2350776Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-d437c85c7343d74e.xml
2025-12-04T13:38:32.2350842Z ============================= test session starts ==============================
2025-12-04T13:38:32.2350981Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.2351026Z cachedir: .pytest_cache
2025-12-04T13:38:32.2351216Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.2351268Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.2351315Z configfile: pytest.ini
2025-12-04T13:38:32.2351491Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.2351574Z collecting ... collected 60 items / 24 deselected / 36 selected
2025-12-04T13:38:32.2351631Z stepcurrent: skipping 24 already run items.
2025-12-04T13:38:32.2351681Z Running 9 items in this shard
2025-12-04T13:38:32.2351683Z 
2025-12-04T13:38:32.2352019Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_no_shard_cuda I1204 13:33:17.235000 425655 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 425724
2025-12-04T13:38:32.2352191Z I1204 13:33:17.236000 425655 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 425725
2025-12-04T13:38:32.2352372Z I1204 13:33:17.237000 425655 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 425726
2025-12-04T13:38:32.2352537Z I1204 13:33:17.237000 425655 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 425727
2025-12-04T13:38:32.2352856Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.2352915Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.2353542Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2353585Z   _warn_cpu_init()
2025-12-04T13:38:32.2353901Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.2353968Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.2354587Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2354632Z   _warn_cpu_init()
2025-12-04T13:38:32.2354943Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.2354999Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.2355623Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2355667Z   _warn_cpu_init()
2025-12-04T13:38:32.2355989Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.2356078Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.2356391Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.2356476Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.2356789Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.2356867Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.2357189Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.2357248Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2357560Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.2357613Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.2358238Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2358282Z   _warn_cpu_init()
2025-12-04T13:38:32.2358591Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.2358675Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.2358939Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.2358989Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2359233Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.2359282Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2359523Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.2359618Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2359858Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.2359907Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2360144Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.2360193Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2360448Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.2360506Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2360748Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.2360793Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2361035Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.2361079Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2361240Z [rank1]:E1204 13:33:24.933000 425725 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2361418Z [rank1]:E1204 13:33:24.933000 425725 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2361742Z [rank1]:E1204 13:33:24.933000 425725 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2361925Z [rank1]:E1204 13:33:24.933000 425725 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2362237Z [rank1]:E1204 13:33:24.933000 425725 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2362375Z [rank1]:E1204 13:33:24.933000 425725 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2362676Z [rank1]:E1204 13:33:24.933000 425725 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2362840Z [rank1]:E1204 13:33:24.933000 425725 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2363138Z [rank1]:E1204 13:33:24.933000 425725 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2363313Z [rank1]:E1204 13:33:24.933000 425725 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2363611Z [rank1]:E1204 13:33:24.933000 425725 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2363762Z [rank1]:E1204 13:33:24.933000 425725 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2364070Z [rank1]:E1204 13:33:24.933000 425725 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2364233Z [rank1]:E1204 13:33:24.933000 425725 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2364761Z [rank1]:E1204 13:33:24.933000 425725 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 1. CUDA driver allocated memory was 2317352960 and is now 3852468224.
2025-12-04T13:38:32.2364905Z [rank1]:E1204 13:33:24.933000 425725 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2365120Z [rank1]:E1204 13:33:24.933000 425725 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2365524Z [rank1]:E1204 13:33:24.933000 425725 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda
2025-12-04T13:38:32.2365651Z [rank1]:E1204 13:33:24.933000 425725 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2365886Z [rank1]:E1204 13:33:24.933000 425725 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2366064Z [rank1]:E1204 13:33:24.933000 425725 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.2366109Z dist init r=1, world=4
2025-12-04T13:38:32.2366257Z [rank0]:E1204 13:33:24.983000 425724 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2366445Z [rank0]:E1204 13:33:24.983000 425724 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2366757Z [rank0]:E1204 13:33:24.983000 425724 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2366928Z [rank0]:E1204 13:33:24.983000 425724 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2367237Z [rank0]:E1204 13:33:24.983000 425724 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2367373Z [rank0]:E1204 13:33:24.983000 425724 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2367676Z [rank0]:E1204 13:33:24.983000 425724 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2367835Z [rank0]:E1204 13:33:24.983000 425724 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2368150Z [rank0]:E1204 13:33:24.983000 425724 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2368309Z [rank0]:E1204 13:33:24.983000 425724 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2368611Z [rank0]:E1204 13:33:24.983000 425724 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2368759Z [rank0]:E1204 13:33:24.983000 425724 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2369064Z [rank0]:E1204 13:33:24.983000 425724 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2369230Z [rank0]:E1204 13:33:24.983000 425724 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2369826Z [rank0]:E1204 13:33:24.983000 425724 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 99840 on device 0. CUDA driver allocated memory was 2453667840 and is now 3988783104.
2025-12-04T13:38:32.2369969Z [rank0]:E1204 13:33:24.983000 425724 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2370180Z [rank0]:E1204 13:33:24.983000 425724 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2370574Z [rank0]:E1204 13:33:24.983000 425724 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda
2025-12-04T13:38:32.2370695Z [rank0]:E1204 13:33:24.983000 425724 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2370929Z [rank0]:E1204 13:33:24.983000 425724 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2371111Z [rank0]:E1204 13:33:24.983000 425724 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.2371168Z dist init r=0, world=4
2025-12-04T13:38:32.2371321Z [rank3]:E1204 13:33:25.019000 425727 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2371495Z [rank3]:E1204 13:33:25.019000 425727 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2371810Z [rank3]:E1204 13:33:25.019000 425727 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2371977Z [rank3]:E1204 13:33:25.019000 425727 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2372288Z [rank3]:E1204 13:33:25.019000 425727 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2372423Z [rank3]:E1204 13:33:25.019000 425727 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2372741Z [rank3]:E1204 13:33:25.019000 425727 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2372902Z [rank3]:E1204 13:33:25.019000 425727 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2373200Z [rank3]:E1204 13:33:25.019000 425727 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2373360Z [rank3]:E1204 13:33:25.019000 425727 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2373660Z [rank3]:E1204 13:33:25.019000 425727 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2373811Z [rank3]:E1204 13:33:25.019000 425727 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2374110Z [rank3]:E1204 13:33:25.019000 425727 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2374291Z [rank3]:E1204 13:33:25.019000 425727 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2374825Z [rank3]:E1204 13:33:25.019000 425727 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 99840 on device 3. CUDA driver allocated memory was 2250244096 and is now 3785359360.
2025-12-04T13:38:32.2374952Z [rank3]:E1204 13:33:25.019000 425727 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2375172Z [rank3]:E1204 13:33:25.019000 425727 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2375567Z [rank3]:E1204 13:33:25.019000 425727 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda
2025-12-04T13:38:32.2375693Z [rank3]:E1204 13:33:25.019000 425727 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2375935Z [rank3]:E1204 13:33:25.019000 425727 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2376114Z [rank3]:E1204 13:33:25.019000 425727 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.2376160Z dist init r=3, world=4
2025-12-04T13:38:32.2376308Z [rank2]:E1204 13:33:25.024000 425726 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2376485Z [rank2]:E1204 13:33:25.024000 425726 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2376798Z [rank2]:E1204 13:33:25.024000 425726 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2376968Z [rank2]:E1204 13:33:25.024000 425726 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2377289Z [rank2]:E1204 13:33:25.024000 425726 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2377424Z [rank2]:E1204 13:33:25.024000 425726 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2377723Z [rank2]:E1204 13:33:25.024000 425726 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2377884Z [rank2]:E1204 13:33:25.024000 425726 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2378188Z [rank2]:E1204 13:33:25.024000 425726 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2378347Z [rank2]:E1204 13:33:25.024000 425726 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2378650Z [rank2]:E1204 13:33:25.024000 425726 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2378809Z [rank2]:E1204 13:33:25.024000 425726 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2379111Z [rank2]:E1204 13:33:25.024000 425726 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2379288Z [rank2]:E1204 13:33:25.024000 425726 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2379858Z [rank2]:E1204 13:33:25.024000 425726 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 108032 on device 2. CUDA driver allocated memory was 2300575744 and is now 3835691008.
2025-12-04T13:38:32.2379985Z [rank2]:E1204 13:33:25.024000 425726 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2380197Z [rank2]:E1204 13:33:25.024000 425726 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2380587Z [rank2]:E1204 13:33:25.024000 425726 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda
2025-12-04T13:38:32.2380725Z [rank2]:E1204 13:33:25.024000 425726 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2380956Z [rank2]:E1204 13:33:25.024000 425726 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2381133Z [rank2]:E1204 13:33:25.024000 425726 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.2381179Z dist init r=2, world=4
2025-12-04T13:38:32.2381545Z [rank0]:[W1204 13:33:25.248162483 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.2381590Z FAILED [9.6210s] [ 11%]
2025-12-04T13:38:32.2381592Z 
2025-12-04T13:38:32.2381656Z =================================== FAILURES ===================================
2025-12-04T13:38:32.2381762Z __ TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda __
2025-12-04T13:38:32.2381838Z Traceback (most recent call last):
2025-12-04T13:38:32.2382017Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.2382067Z     self._join_processes(fn)
2025-12-04T13:38:32.2382255Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.2382316Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.2382512Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.2382565Z     raise RuntimeError(error)
2025-12-04T13:38:32.2382650Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T13:38:32.2382703Z Traceback (most recent call last):
2025-12-04T13:38:32.2382878Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2382931Z     getattr(self, test_name)()
2025-12-04T13:38:32.2383102Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2383143Z     fn()
2025-12-04T13:38:32.2383324Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2383372Z     method(*args, **kwargs)
2025-12-04T13:38:32.2383552Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2383600Z     method(*args, **kwargs)
2025-12-04T13:38:32.2383766Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2383806Z     with policy():
2025-12-04T13:38:32.2383976Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2384021Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2384413Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 1. CUDA driver allocated memory was 2317352960 and is now 3852468224.
2025-12-04T13:38:32.2384415Z 
2025-12-04T13:38:32.2387102Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2387364Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda
2025-12-04T13:38:32.2387391Z 
2025-12-04T13:38:32.2387486Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2387488Z 
2025-12-04T13:38:32.2387490Z 
2025-12-04T13:38:32.2387576Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.2387672Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.2387930Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-d437c85c7343d74e.xml -
2025-12-04T13:38:32.2387997Z =========================== short test summary info ============================
2025-12-04T13:38:32.2388270Z FAILED [9.6210s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_no_shard_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T13:38:32.2388322Z Traceback (most recent call last):
2025-12-04T13:38:32.2388504Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2388548Z     getattr(self, test_name)()
2025-12-04T13:38:32.2388737Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2388774Z     fn()
2025-12-04T13:38:32.2388942Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2388984Z     method(*args, **kwargs)
2025-12-04T13:38:32.2389151Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2389192Z     method(*args, **kwargs)
2025-12-04T13:38:32.2389357Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2389398Z     with policy():
2025-12-04T13:38:32.2389565Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2389644Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2390027Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 1. CUDA driver allocated memory was 2317352960 and is now 3852468224.
2025-12-04T13:38:32.2390029Z 
2025-12-04T13:38:32.2390128Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2390381Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda
2025-12-04T13:38:32.2390400Z 
2025-12-04T13:38:32.2390494Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2390561Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.2390628Z ======================= 1 failed, 24 deselected in 9.78s =======================
2025-12-04T13:38:32.2390667Z Got exit code 1
2025-12-04T13:38:32.2390710Z Retrying single test...
2025-12-04T13:38:32.2390916Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-b5832596ab7fbeec.xml
2025-12-04T13:38:32.2390979Z ============================= test session starts ==============================
2025-12-04T13:38:32.2391103Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.2391147Z cachedir: .pytest_cache
2025-12-04T13:38:32.2391319Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.2391383Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.2391428Z configfile: pytest.ini
2025-12-04T13:38:32.2391604Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.2391684Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.2391933Z stepcurrent: skipping 24 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_no_shard_cuda
2025-12-04T13:38:32.2391979Z Running 1 items in this shard
2025-12-04T13:38:32.2391982Z 
2025-12-04T13:38:32.2392316Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_no_shard_cuda I1204 13:33:29.524000 426057 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 426126
2025-12-04T13:38:32.2392485Z I1204 13:33:29.525000 426057 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 426127
2025-12-04T13:38:32.2392648Z I1204 13:33:29.526000 426057 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 426128
2025-12-04T13:38:32.2392810Z I1204 13:33:29.526000 426057 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 426129
2025-12-04T13:38:32.2393138Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.2393196Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.2393822Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2393863Z   _warn_cpu_init()
2025-12-04T13:38:32.2394177Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.2394230Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.2394861Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2394915Z   _warn_cpu_init()
2025-12-04T13:38:32.2395224Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.2395310Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.2395619Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.2395703Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.2396018Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.2396079Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2396386Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.2396440Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.2397060Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2397099Z   _warn_cpu_init()
2025-12-04T13:38:32.2397411Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.2397462Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.2398091Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2398132Z   _warn_cpu_init()
2025-12-04T13:38:32.2398440Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.2398523Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.2398830Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.2398912Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.2399158Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.2399219Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2399458Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.2399519Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2399800Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.2399846Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2400086Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.2400131Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2400367Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.2400412Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2400648Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.2400728Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2400964Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.2401007Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2401246Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.2401288Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2401446Z [rank3]:E1204 13:33:37.146000 426129 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2401619Z [rank3]:E1204 13:33:37.146000 426129 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2401935Z [rank3]:E1204 13:33:37.146000 426129 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2402102Z [rank3]:E1204 13:33:37.146000 426129 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2402434Z [rank3]:E1204 13:33:37.146000 426129 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2402571Z [rank3]:E1204 13:33:37.146000 426129 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2402871Z [rank3]:E1204 13:33:37.146000 426129 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2403032Z [rank3]:E1204 13:33:37.146000 426129 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2403331Z [rank3]:E1204 13:33:37.146000 426129 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2403491Z [rank3]:E1204 13:33:37.146000 426129 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2403806Z [rank3]:E1204 13:33:37.146000 426129 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2403953Z [rank3]:E1204 13:33:37.146000 426129 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2404275Z [rank3]:E1204 13:33:37.146000 426129 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2404436Z [rank3]:E1204 13:33:37.146000 426129 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2404965Z [rank3]:E1204 13:33:37.146000 426129 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 108032 on device 3. CUDA driver allocated memory was 2250244096 and is now 3785359360.
2025-12-04T13:38:32.2405089Z [rank3]:E1204 13:33:37.146000 426129 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2405305Z [rank3]:E1204 13:33:37.146000 426129 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2405705Z [rank3]:E1204 13:33:37.146000 426129 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda
2025-12-04T13:38:32.2405831Z [rank3]:E1204 13:33:37.146000 426129 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2406059Z [rank3]:E1204 13:33:37.146000 426129 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2406235Z [rank3]:E1204 13:33:37.146000 426129 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.2406278Z dist init r=3, world=4
2025-12-04T13:38:32.2406426Z [rank2]:E1204 13:33:37.147000 426128 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2406599Z [rank2]:E1204 13:33:37.147000 426128 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2406923Z [rank2]:E1204 13:33:37.147000 426128 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2407090Z [rank2]:E1204 13:33:37.147000 426128 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2407400Z [rank2]:E1204 13:33:37.147000 426128 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2407536Z [rank2]:E1204 13:33:37.147000 426128 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2407835Z [rank2]:E1204 13:33:37.147000 426128 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2407995Z [rank2]:E1204 13:33:37.147000 426128 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2408295Z [rank2]:E1204 13:33:37.147000 426128 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2408463Z [rank2]:E1204 13:33:37.147000 426128 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2408762Z [rank2]:E1204 13:33:37.147000 426128 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2408920Z [rank2]:E1204 13:33:37.147000 426128 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2409221Z [rank2]:E1204 13:33:37.147000 426128 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2409382Z [rank2]:E1204 13:33:37.147000 426128 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2409947Z [rank2]:E1204 13:33:37.147000 426128 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 99840 on device 2. CUDA driver allocated memory was 2300575744 and is now 3835691008.
2025-12-04T13:38:32.2410088Z [rank2]:E1204 13:33:37.147000 426128 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2410298Z [rank2]:E1204 13:33:37.147000 426128 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2410688Z [rank2]:E1204 13:33:37.147000 426128 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda
2025-12-04T13:38:32.2410809Z [rank2]:E1204 13:33:37.147000 426128 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2411039Z [rank2]:E1204 13:33:37.147000 426128 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2411218Z [rank2]:E1204 13:33:37.147000 426128 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.2411259Z dist init r=2, world=4
2025-12-04T13:38:32.2411506Z [rank0]:E1204 13:33:37.202000 426126 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2411696Z [rank0]:E1204 13:33:37.202000 426126 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2412008Z [rank0]:E1204 13:33:37.202000 426126 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2412174Z [rank0]:E1204 13:33:37.202000 426126 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2412485Z [rank0]:E1204 13:33:37.202000 426126 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2412618Z [rank0]:E1204 13:33:37.202000 426126 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2412919Z [rank0]:E1204 13:33:37.202000 426126 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2413095Z [rank0]:E1204 13:33:37.202000 426126 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2413395Z [rank0]:E1204 13:33:37.202000 426126 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2413573Z [rank0]:E1204 13:33:37.202000 426126 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2413870Z [rank0]:E1204 13:33:37.202000 426126 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2414020Z [rank0]:E1204 13:33:37.202000 426126 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2414319Z [rank0]:E1204 13:33:37.202000 426126 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2414482Z [rank0]:E1204 13:33:37.202000 426126 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2415011Z [rank0]:E1204 13:33:37.202000 426126 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 99840 on device 0. CUDA driver allocated memory was 2453667840 and is now 3988783104.
2025-12-04T13:38:32.2415134Z [rank0]:E1204 13:33:37.202000 426126 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2415345Z [rank0]:E1204 13:33:37.202000 426126 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2415734Z [rank0]:E1204 13:33:37.202000 426126 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda
2025-12-04T13:38:32.2415858Z [rank0]:E1204 13:33:37.202000 426126 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2416085Z [rank0]:E1204 13:33:37.202000 426126 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2416274Z [rank0]:E1204 13:33:37.202000 426126 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.2416316Z dist init r=0, world=4
2025-12-04T13:38:32.2416464Z [rank1]:E1204 13:33:37.213000 426127 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2416638Z [rank1]:E1204 13:33:37.213000 426127 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2416948Z [rank1]:E1204 13:33:37.213000 426127 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2417115Z [rank1]:E1204 13:33:37.213000 426127 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2417422Z [rank1]:E1204 13:33:37.213000 426127 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2417557Z [rank1]:E1204 13:33:37.213000 426127 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2417865Z [rank1]:E1204 13:33:37.213000 426127 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2418046Z [rank1]:E1204 13:33:37.213000 426127 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2418345Z [rank1]:E1204 13:33:37.213000 426127 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2418503Z [rank1]:E1204 13:33:37.213000 426127 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2418800Z [rank1]:E1204 13:33:37.213000 426127 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2418945Z [rank1]:E1204 13:33:37.213000 426127 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2419246Z [rank1]:E1204 13:33:37.213000 426127 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2419416Z [rank1]:E1204 13:33:37.213000 426127 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2419981Z [rank1]:E1204 13:33:37.213000 426127 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 103936 on device 1. CUDA driver allocated memory was 2317352960 and is now 3852468224.
2025-12-04T13:38:32.2420107Z [rank1]:E1204 13:33:37.213000 426127 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2420316Z [rank1]:E1204 13:33:37.213000 426127 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2420709Z [rank1]:E1204 13:33:37.213000 426127 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda
2025-12-04T13:38:32.2420846Z [rank1]:E1204 13:33:37.213000 426127 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2421074Z [rank1]:E1204 13:33:37.213000 426127 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2421250Z [rank1]:E1204 13:33:37.213000 426127 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.2421292Z dist init r=1, world=4
2025-12-04T13:38:32.2421654Z [rank0]:[W1204 13:33:37.464071535 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.2421698Z FAILED [9.5199s] [100%]
2025-12-04T13:38:32.2421700Z 
2025-12-04T13:38:32.2421761Z =================================== FAILURES ===================================
2025-12-04T13:38:32.2421869Z __ TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda __
2025-12-04T13:38:32.2421920Z Traceback (most recent call last):
2025-12-04T13:38:32.2422098Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.2422146Z     self._join_processes(fn)
2025-12-04T13:38:32.2422346Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.2422419Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.2422614Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.2422661Z     raise RuntimeError(error)
2025-12-04T13:38:32.2422746Z RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T13:38:32.2422795Z Traceback (most recent call last):
2025-12-04T13:38:32.2422971Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2423018Z     getattr(self, test_name)()
2025-12-04T13:38:32.2423187Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2423228Z     fn()
2025-12-04T13:38:32.2423391Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2423438Z     method(*args, **kwargs)
2025-12-04T13:38:32.2423615Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2423659Z     method(*args, **kwargs)
2025-12-04T13:38:32.2423821Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2423860Z     with policy():
2025-12-04T13:38:32.2424029Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2424073Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2424459Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 99840 on device 2. CUDA driver allocated memory was 2300575744 and is now 3835691008.
2025-12-04T13:38:32.2424462Z 
2025-12-04T13:38:32.2424543Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2424798Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda
2025-12-04T13:38:32.2424800Z 
2025-12-04T13:38:32.2424894Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2424896Z 
2025-12-04T13:38:32.2424898Z 
2025-12-04T13:38:32.2424991Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.2425085Z Process 2 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.2425341Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-b5832596ab7fbeec.xml -
2025-12-04T13:38:32.2425408Z =========================== short test summary info ============================
2025-12-04T13:38:32.2425679Z FAILED [9.5199s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_no_shard_cuda - RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T13:38:32.2425730Z Traceback (most recent call last):
2025-12-04T13:38:32.2425906Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2425953Z     getattr(self, test_name)()
2025-12-04T13:38:32.2426127Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2426165Z     fn()
2025-12-04T13:38:32.2426339Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2426384Z     method(*args, **kwargs)
2025-12-04T13:38:32.2426546Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2426602Z     method(*args, **kwargs)
2025-12-04T13:38:32.2426763Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2426804Z     with policy():
2025-12-04T13:38:32.2426967Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2427013Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2427394Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 99840 on device 2. CUDA driver allocated memory was 2300575744 and is now 3835691008.
2025-12-04T13:38:32.2427400Z 
2025-12-04T13:38:32.2427479Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2427732Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda
2025-12-04T13:38:32.2427748Z 
2025-12-04T13:38:32.2427841Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2427910Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.2427974Z ======================= 1 failed, 32 deselected in 9.68s =======================
2025-12-04T13:38:32.2428016Z Got exit code 1
2025-12-04T13:38:32.2428058Z Retrying single test...
2025-12-04T13:38:32.2428261Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-7e47b01e24d8b7e2.xml
2025-12-04T13:38:32.2428323Z ============================= test session starts ==============================
2025-12-04T13:38:32.2428446Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.2428491Z cachedir: .pytest_cache
2025-12-04T13:38:32.2428663Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.2428712Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.2428757Z configfile: pytest.ini
2025-12-04T13:38:32.2428934Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.2429028Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.2429274Z stepcurrent: skipping 24 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_no_shard_cuda
2025-12-04T13:38:32.2429320Z Running 1 items in this shard
2025-12-04T13:38:32.2429323Z 
2025-12-04T13:38:32.2429694Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_no_shard_cuda I1204 13:33:41.632000 426459 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 426528
2025-12-04T13:38:32.2429864Z I1204 13:33:41.632000 426459 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 426529
2025-12-04T13:38:32.2430032Z I1204 13:33:41.633000 426459 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 426530
2025-12-04T13:38:32.2430197Z I1204 13:33:41.633000 426459 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 426531
2025-12-04T13:38:32.2430515Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.2430585Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.2431207Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2431264Z   _warn_cpu_init()
2025-12-04T13:38:32.2431577Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.2431632Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.2432245Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2432305Z   _warn_cpu_init()
2025-12-04T13:38:32.2432615Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.2432698Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.2433008Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.2433088Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.2433404Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.2433449Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2433770Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.2433823Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.2434440Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2434481Z   _warn_cpu_init()
2025-12-04T13:38:32.2434791Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.2434844Z   return FSDP(layer, group, **fsdp_kwargs)
2025-12-04T13:38:32.2435492Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2435533Z   _warn_cpu_init()
2025-12-04T13:38:32.2435854Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.2435936Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.2436245Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead.
2025-12-04T13:38:32.2436324Z   fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs)
2025-12-04T13:38:32.2436573Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.2436617Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2436858Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.2436914Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2437153Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.2437196Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2437437Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned.
2025-12-04T13:38:32.2437479Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2437718Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.2437761Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2438000Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.2438043Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2438279Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.2438333Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2438570Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned.
2025-12-04T13:38:32.2438614Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2438771Z [rank2]:E1204 13:33:49.354000 426530 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2438948Z [rank2]:E1204 13:33:49.354000 426530 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2439261Z [rank2]:E1204 13:33:49.354000 426530 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2439429Z [rank2]:E1204 13:33:49.354000 426530 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2439790Z [rank2]:E1204 13:33:49.354000 426530 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2439941Z [rank2]:E1204 13:33:49.354000 426530 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2440243Z [rank2]:E1204 13:33:49.354000 426530 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2440422Z [rank2]:E1204 13:33:49.354000 426530 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2440721Z [rank2]:E1204 13:33:49.354000 426530 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2440878Z [rank2]:E1204 13:33:49.354000 426530 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2441180Z [rank2]:E1204 13:33:49.354000 426530 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2441328Z [rank2]:E1204 13:33:49.354000 426530 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2441643Z [rank2]:E1204 13:33:49.354000 426530 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2441803Z [rank2]:E1204 13:33:49.354000 426530 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2442331Z [rank2]:E1204 13:33:49.354000 426530 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 2. CUDA driver allocated memory was 2300575744 and is now 3835691008.
2025-12-04T13:38:32.2442456Z [rank2]:E1204 13:33:49.354000 426530 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2442668Z [rank2]:E1204 13:33:49.354000 426530 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2443074Z [rank2]:E1204 13:33:49.354000 426530 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda
2025-12-04T13:38:32.2443196Z [rank2]:E1204 13:33:49.354000 426530 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2443427Z [rank2]:E1204 13:33:49.354000 426530 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2443603Z [rank2]:E1204 13:33:49.354000 426530 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.2443647Z dist init r=2, world=4
2025-12-04T13:38:32.2443797Z [rank3]:E1204 13:33:49.357000 426531 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2443969Z [rank3]:E1204 13:33:49.357000 426531 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2444283Z [rank3]:E1204 13:33:49.357000 426531 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2444459Z [rank3]:E1204 13:33:49.357000 426531 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2444766Z [rank3]:E1204 13:33:49.357000 426531 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2444913Z [rank3]:E1204 13:33:49.357000 426531 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2445212Z [rank3]:E1204 13:33:49.357000 426531 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2445374Z [rank3]:E1204 13:33:49.357000 426531 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2445674Z [rank3]:E1204 13:33:49.357000 426531 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2445832Z [rank3]:E1204 13:33:49.357000 426531 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2446145Z [rank3]:E1204 13:33:49.357000 426531 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2446292Z [rank3]:E1204 13:33:49.357000 426531 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2446590Z [rank3]:E1204 13:33:49.357000 426531 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2446750Z [rank3]:E1204 13:33:49.357000 426531 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2447263Z [rank3]:E1204 13:33:49.357000 426531 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 3. CUDA driver allocated memory was 2250244096 and is now 3785359360.
2025-12-04T13:38:32.2447389Z [rank3]:E1204 13:33:49.357000 426531 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2447613Z [rank3]:E1204 13:33:49.357000 426531 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2448004Z [rank3]:E1204 13:33:49.357000 426531 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda
2025-12-04T13:38:32.2448125Z [rank3]:E1204 13:33:49.357000 426531 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2448353Z [rank3]:E1204 13:33:49.357000 426531 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2448532Z [rank3]:E1204 13:33:49.357000 426531 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.2448576Z dist init r=3, world=4
2025-12-04T13:38:32.2448725Z [rank1]:E1204 13:33:49.416000 426529 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2448900Z [rank1]:E1204 13:33:49.416000 426529 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2449221Z [rank1]:E1204 13:33:49.416000 426529 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2449403Z [rank1]:E1204 13:33:49.416000 426529 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2449759Z [rank1]:E1204 13:33:49.416000 426529 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2449893Z [rank1]:E1204 13:33:49.416000 426529 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2450192Z [rank1]:E1204 13:33:49.416000 426529 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2450354Z [rank1]:E1204 13:33:49.416000 426529 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2450655Z [rank1]:E1204 13:33:49.416000 426529 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2450831Z [rank1]:E1204 13:33:49.416000 426529 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2451135Z [rank1]:E1204 13:33:49.416000 426529 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2451284Z [rank1]:E1204 13:33:49.416000 426529 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2451582Z [rank1]:E1204 13:33:49.416000 426529 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2451744Z [rank1]:E1204 13:33:49.416000 426529 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2452281Z [rank1]:E1204 13:33:49.416000 426529 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 108032 on device 1. CUDA driver allocated memory was 2317352960 and is now 3852468224.
2025-12-04T13:38:32.2452408Z [rank1]:E1204 13:33:49.416000 426529 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2452620Z [rank1]:E1204 13:33:49.416000 426529 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2453013Z [rank1]:E1204 13:33:49.416000 426529 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda
2025-12-04T13:38:32.2453139Z [rank1]:E1204 13:33:49.416000 426529 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2453369Z [rank1]:E1204 13:33:49.416000 426529 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2453549Z [rank1]:E1204 13:33:49.416000 426529 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.2453591Z dist init r=1, world=4
2025-12-04T13:38:32.2453754Z [rank0]:E1204 13:33:49.417000 426528 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2453927Z [rank0]:E1204 13:33:49.417000 426528 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2454255Z [rank0]:E1204 13:33:49.417000 426528 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2454421Z [rank0]:E1204 13:33:49.417000 426528 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2454731Z [rank0]:E1204 13:33:49.417000 426528 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2454870Z [rank0]:E1204 13:33:49.417000 426528 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2455167Z [rank0]:E1204 13:33:49.417000 426528 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2455343Z [rank0]:E1204 13:33:49.417000 426528 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2455644Z [rank0]:E1204 13:33:49.417000 426528 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2455806Z [rank0]:E1204 13:33:49.417000 426528 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2456103Z [rank0]:E1204 13:33:49.417000 426528 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2456252Z [rank0]:E1204 13:33:49.417000 426528 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2456555Z [rank0]:E1204 13:33:49.417000 426528 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2456714Z [rank0]:E1204 13:33:49.417000 426528 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2457244Z [rank0]:E1204 13:33:49.417000 426528 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 112128 on device 0. CUDA driver allocated memory was 2453667840 and is now 3988783104.
2025-12-04T13:38:32.2457366Z [rank0]:E1204 13:33:49.417000 426528 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2457579Z [rank0]:E1204 13:33:49.417000 426528 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2457968Z [rank0]:E1204 13:33:49.417000 426528 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda
2025-12-04T13:38:32.2458094Z [rank0]:E1204 13:33:49.417000 426528 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2458335Z [rank0]:E1204 13:33:49.417000 426528 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2458511Z [rank0]:E1204 13:33:49.417000 426528 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.2458568Z dist init r=0, world=4
2025-12-04T13:38:32.2458932Z [rank0]:[W1204 13:33:49.691287575 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.2458978Z FAILED [9.6183s] [100%]
2025-12-04T13:38:32.2458981Z 
2025-12-04T13:38:32.2459042Z =================================== FAILURES ===================================
2025-12-04T13:38:32.2459154Z __ TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda __
2025-12-04T13:38:32.2459205Z Traceback (most recent call last):
2025-12-04T13:38:32.2459387Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.2459435Z     self._join_processes(fn)
2025-12-04T13:38:32.2459683Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.2459756Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.2459955Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.2460006Z     raise RuntimeError(error)
2025-12-04T13:38:32.2460091Z RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T13:38:32.2460145Z Traceback (most recent call last):
2025-12-04T13:38:32.2460319Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2460368Z     getattr(self, test_name)()
2025-12-04T13:38:32.2460541Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2460582Z     fn()
2025-12-04T13:38:32.2460746Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2460796Z     method(*args, **kwargs)
2025-12-04T13:38:32.2460960Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2461008Z     method(*args, **kwargs)
2025-12-04T13:38:32.2461170Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2461233Z     with policy():
2025-12-04T13:38:32.2461400Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2461447Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2461828Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 2. CUDA driver allocated memory was 2300575744 and is now 3835691008.
2025-12-04T13:38:32.2461832Z 
2025-12-04T13:38:32.2461918Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2462170Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda
2025-12-04T13:38:32.2462177Z 
2025-12-04T13:38:32.2462271Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2462273Z 
2025-12-04T13:38:32.2462275Z 
2025-12-04T13:38:32.2462361Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.2462455Z Process 2 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.2462728Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-7e47b01e24d8b7e2.xml -
2025-12-04T13:38:32.2462808Z =========================== short test summary info ============================
2025-12-04T13:38:32.2463086Z FAILED [9.6183s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_no_shard_cuda - RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T13:38:32.2463136Z Traceback (most recent call last):
2025-12-04T13:38:32.2463319Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2463366Z     getattr(self, test_name)()
2025-12-04T13:38:32.2463542Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2463582Z     fn()
2025-12-04T13:38:32.2463748Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2463792Z     method(*args, **kwargs)
2025-12-04T13:38:32.2463963Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2464018Z     method(*args, **kwargs)
2025-12-04T13:38:32.2464184Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2464227Z     with policy():
2025-12-04T13:38:32.2464394Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2464441Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2464820Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 95744 on device 2. CUDA driver allocated memory was 2300575744 and is now 3835691008.
2025-12-04T13:38:32.2464823Z 
2025-12-04T13:38:32.2464906Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2465159Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_no_shard_cuda
2025-12-04T13:38:32.2465163Z 
2025-12-04T13:38:32.2465259Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2465327Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.2465409Z ======================= 1 failed, 32 deselected in 9.78s =======================
2025-12-04T13:38:32.2465450Z Got exit code 1
2025-12-04T13:38:32.2465650Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_no_shard_cuda
2025-12-04T13:38:32.2465789Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T13:38:32.2465995Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-09e9a6f4547b43c6.xml
2025-12-04T13:38:32.2466063Z ============================= test session starts ==============================
2025-12-04T13:38:32.2466186Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.2466234Z cachedir: .pytest_cache
2025-12-04T13:38:32.2466407Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.2466461Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.2466504Z configfile: pytest.ini
2025-12-04T13:38:32.2466683Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.2466774Z collecting ... collected 60 items / 25 deselected / 35 selected
2025-12-04T13:38:32.2466835Z stepcurrent: skipping 25 already run items.
2025-12-04T13:38:32.2466897Z Running 8 items in this shard
2025-12-04T13:38:32.2466899Z 
2025-12-04T13:38:32.2467232Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_none_cuda I1204 13:33:53.811000 426861 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 426930
2025-12-04T13:38:32.2467400Z I1204 13:33:53.812000 426861 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 426931
2025-12-04T13:38:32.2467569Z I1204 13:33:53.812000 426861 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 426932
2025-12-04T13:38:32.2467732Z I1204 13:33:53.813000 426861 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 426933
2025-12-04T13:38:32.2468371Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2468435Z   _warn_cpu_init()
2025-12-04T13:38:32.2469052Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2469098Z   _warn_cpu_init()
2025-12-04T13:38:32.2469754Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2469801Z   _warn_cpu_init()
2025-12-04T13:38:32.2470134Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.2470182Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2470801Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2470844Z   _warn_cpu_init()
2025-12-04T13:38:32.2471001Z [rank1]:E1204 13:34:01.556000 426931 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2471177Z [rank1]:E1204 13:34:01.556000 426931 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2471491Z [rank1]:E1204 13:34:01.556000 426931 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2471674Z [rank1]:E1204 13:34:01.556000 426931 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2471982Z [rank1]:E1204 13:34:01.556000 426931 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2472134Z [rank1]:E1204 13:34:01.556000 426931 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2472434Z [rank1]:E1204 13:34:01.556000 426931 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2472598Z [rank1]:E1204 13:34:01.556000 426931 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2472896Z [rank1]:E1204 13:34:01.556000 426931 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2473058Z [rank1]:E1204 13:34:01.556000 426931 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2473376Z [rank1]:E1204 13:34:01.556000 426931 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2473524Z [rank1]:E1204 13:34:01.556000 426931 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2473825Z [rank1]:E1204 13:34:01.556000 426931 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2473986Z [rank1]:E1204 13:34:01.556000 426931 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2474503Z [rank1]:E1204 13:34:01.556000 426931 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 25088 on device 1. CUDA driver allocated memory was 2317352960 and is now 3827302400.
2025-12-04T13:38:32.2474628Z [rank1]:E1204 13:34:01.556000 426931 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2474854Z [rank1]:E1204 13:34:01.556000 426931 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2475242Z [rank1]:E1204 13:34:01.556000 426931 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda
2025-12-04T13:38:32.2475366Z [rank1]:E1204 13:34:01.556000 426931 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2475598Z [rank1]:E1204 13:34:01.556000 426931 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2475776Z [rank1]:E1204 13:34:01.556000 426931 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.2475822Z dist init r=1, world=4
2025-12-04T13:38:32.2475970Z [rank2]:E1204 13:34:01.559000 426932 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2476157Z [rank2]:E1204 13:34:01.559000 426932 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2476465Z [rank2]:E1204 13:34:01.559000 426932 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2476646Z [rank2]:E1204 13:34:01.559000 426932 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2476956Z [rank2]:E1204 13:34:01.559000 426932 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2477090Z [rank2]:E1204 13:34:01.559000 426932 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2477391Z [rank2]:E1204 13:34:01.559000 426932 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2477549Z [rank2]:E1204 13:34:01.559000 426932 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2477861Z [rank2]:E1204 13:34:01.559000 426932 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2478019Z [rank2]:E1204 13:34:01.559000 426932 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2478322Z [rank2]:E1204 13:34:01.559000 426932 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2478474Z [rank2]:E1204 13:34:01.559000 426932 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2478773Z [rank2]:E1204 13:34:01.559000 426932 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2478938Z [rank2]:E1204 13:34:01.559000 426932 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2479461Z [rank2]:E1204 13:34:01.559000 426932 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 2300575744 and is now 3810525184.
2025-12-04T13:38:32.2479638Z [rank2]:E1204 13:34:01.559000 426932 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2479850Z [rank2]:E1204 13:34:01.559000 426932 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2480238Z [rank2]:E1204 13:34:01.559000 426932 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda
2025-12-04T13:38:32.2480366Z [rank2]:E1204 13:34:01.559000 426932 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2480594Z [rank2]:E1204 13:34:01.559000 426932 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2480776Z [rank2]:E1204 13:34:01.559000 426932 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.2480831Z dist init r=2, world=4
2025-12-04T13:38:32.2480984Z [rank3]:E1204 13:34:01.567000 426933 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2481170Z [rank3]:E1204 13:34:01.567000 426933 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2481484Z [rank3]:E1204 13:34:01.567000 426933 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2481651Z [rank3]:E1204 13:34:01.567000 426933 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2481961Z [rank3]:E1204 13:34:01.567000 426933 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2482097Z [rank3]:E1204 13:34:01.567000 426933 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2482396Z [rank3]:E1204 13:34:01.567000 426933 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2482570Z [rank3]:E1204 13:34:01.567000 426933 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2482868Z [rank3]:E1204 13:34:01.567000 426933 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2483029Z [rank3]:E1204 13:34:01.567000 426933 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2483329Z [rank3]:E1204 13:34:01.567000 426933 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2483480Z [rank3]:E1204 13:34:01.567000 426933 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2483782Z [rank3]:E1204 13:34:01.567000 426933 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2483956Z [rank3]:E1204 13:34:01.567000 426933 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2484474Z [rank3]:E1204 13:34:01.567000 426933 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 25088 on device 3. CUDA driver allocated memory was 2250244096 and is now 3760193536.
2025-12-04T13:38:32.2484598Z [rank3]:E1204 13:34:01.567000 426933 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2484811Z [rank3]:E1204 13:34:01.567000 426933 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2485198Z [rank3]:E1204 13:34:01.567000 426933 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda
2025-12-04T13:38:32.2485318Z [rank3]:E1204 13:34:01.567000 426933 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2485559Z [rank3]:E1204 13:34:01.567000 426933 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2485737Z [rank3]:E1204 13:34:01.567000 426933 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.2485803Z dist init r=3, world=4
2025-12-04T13:38:32.2485952Z [rank0]:E1204 13:34:01.612000 426930 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2486126Z [rank0]:E1204 13:34:01.612000 426930 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2486435Z [rank0]:E1204 13:34:01.612000 426930 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2486605Z [rank0]:E1204 13:34:01.612000 426930 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2486913Z [rank0]:E1204 13:34:01.612000 426930 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2487061Z [rank0]:E1204 13:34:01.612000 426930 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2487362Z [rank0]:E1204 13:34:01.612000 426930 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2487520Z [rank0]:E1204 13:34:01.612000 426930 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2487822Z [rank0]:E1204 13:34:01.612000 426930 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2487979Z [rank0]:E1204 13:34:01.612000 426930 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2488283Z [rank0]:E1204 13:34:01.612000 426930 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2488430Z [rank0]:E1204 13:34:01.612000 426930 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2488743Z [rank0]:E1204 13:34:01.612000 426930 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2488907Z [rank0]:E1204 13:34:01.612000 426930 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2489417Z [rank0]:E1204 13:34:01.612000 426930 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 16896 on device 0. CUDA driver allocated memory was 2453667840 and is now 3963617280.
2025-12-04T13:38:32.2489545Z [rank0]:E1204 13:34:01.612000 426930 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2489801Z [rank0]:E1204 13:34:01.612000 426930 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2490205Z [rank0]:E1204 13:34:01.612000 426930 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda
2025-12-04T13:38:32.2490330Z [rank0]:E1204 13:34:01.612000 426930 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2490573Z [rank0]:E1204 13:34:01.612000 426930 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2490752Z [rank0]:E1204 13:34:01.612000 426930 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.2490794Z dist init r=0, world=4
2025-12-04T13:38:32.2491161Z [rank0]:[W1204 13:34:01.870736961 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.2491204Z FAILED [9.6201s] [ 12%]
2025-12-04T13:38:32.2491207Z 
2025-12-04T13:38:32.2491270Z =================================== FAILURES ===================================
2025-12-04T13:38:32.2491378Z ____ TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda ____
2025-12-04T13:38:32.2491446Z Traceback (most recent call last):
2025-12-04T13:38:32.2491622Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.2491673Z     self._join_processes(fn)
2025-12-04T13:38:32.2491859Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.2491921Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.2492114Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.2492164Z     raise RuntimeError(error)
2025-12-04T13:38:32.2492254Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T13:38:32.2492303Z Traceback (most recent call last):
2025-12-04T13:38:32.2492481Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2492528Z     getattr(self, test_name)()
2025-12-04T13:38:32.2492704Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2492741Z     fn()
2025-12-04T13:38:32.2492908Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2492967Z     method(*args, **kwargs)
2025-12-04T13:38:32.2493135Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2493179Z     method(*args, **kwargs)
2025-12-04T13:38:32.2493347Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2493387Z     with policy():
2025-12-04T13:38:32.2493557Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2493603Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2493982Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 25088 on device 1. CUDA driver allocated memory was 2317352960 and is now 3827302400.
2025-12-04T13:38:32.2493984Z 
2025-12-04T13:38:32.2494066Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2494315Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda
2025-12-04T13:38:32.2494328Z 
2025-12-04T13:38:32.2494425Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2494427Z 
2025-12-04T13:38:32.2494492Z Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.2494556Z Traceback (most recent call last):
2025-12-04T13:38:32.2494732Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2494783Z     getattr(self, test_name)()
2025-12-04T13:38:32.2494955Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2494996Z     fn()
2025-12-04T13:38:32.2495161Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2495210Z     method(*args, **kwargs)
2025-12-04T13:38:32.2495375Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2495421Z     method(*args, **kwargs)
2025-12-04T13:38:32.2495585Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2495641Z     with policy():
2025-12-04T13:38:32.2495805Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2495852Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2496230Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 25088 on device 3. CUDA driver allocated memory was 2250244096 and is now 3760193536.
2025-12-04T13:38:32.2496235Z 
2025-12-04T13:38:32.2496314Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2496564Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda
2025-12-04T13:38:32.2496566Z 
2025-12-04T13:38:32.2496661Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2496664Z 
2025-12-04T13:38:32.2496666Z 
2025-12-04T13:38:32.2496753Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.2496847Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.2497114Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-09e9a6f4547b43c6.xml -
2025-12-04T13:38:32.2497180Z =========================== short test summary info ============================
2025-12-04T13:38:32.2497447Z FAILED [9.6201s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_none_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T13:38:32.2497497Z Traceback (most recent call last):
2025-12-04T13:38:32.2497680Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2497727Z     getattr(self, test_name)()
2025-12-04T13:38:32.2497905Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2497946Z     fn()
2025-12-04T13:38:32.2498112Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2498160Z     method(*args, **kwargs)
2025-12-04T13:38:32.2498326Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2498372Z     method(*args, **kwargs)
2025-12-04T13:38:32.2498548Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2498592Z     with policy():
2025-12-04T13:38:32.2498757Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2498818Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2499196Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 25088 on device 1. CUDA driver allocated memory was 2317352960 and is now 3827302400.
2025-12-04T13:38:32.2499198Z 
2025-12-04T13:38:32.2499281Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2499525Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda
2025-12-04T13:38:32.2499527Z 
2025-12-04T13:38:32.2499675Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2499677Z 
2025-12-04T13:38:32.2499743Z Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.2499816Z Traceback (most recent call last):
2025-12-04T13:38:32.2499996Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2500041Z     getattr(self, test_name)()
2025-12-04T13:38:32.2500220Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2500258Z     fn()
2025-12-04T13:38:32.2500427Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2500470Z     method(*args, **kwargs)
2025-12-04T13:38:32.2500636Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2500680Z     method(*args, **kwargs)
2025-12-04T13:38:32.2500844Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2500886Z     with policy():
2025-12-04T13:38:32.2501055Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2501100Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2501490Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 25088 on device 3. CUDA driver allocated memory was 2250244096 and is now 3760193536.
2025-12-04T13:38:32.2501493Z 
2025-12-04T13:38:32.2501572Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2501820Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda
2025-12-04T13:38:32.2501823Z 
2025-12-04T13:38:32.2501918Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2501988Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.2502059Z ======================= 1 failed, 25 deselected in 9.78s =======================
2025-12-04T13:38:32.2502099Z Got exit code 1
2025-12-04T13:38:32.2502147Z Retrying single test...
2025-12-04T13:38:32.2502353Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-dfa5b0c3a6ae8df1.xml
2025-12-04T13:38:32.2502419Z ============================= test session starts ==============================
2025-12-04T13:38:32.2502541Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.2502613Z cachedir: .pytest_cache
2025-12-04T13:38:32.2502786Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.2502856Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.2502901Z configfile: pytest.ini
2025-12-04T13:38:32.2503080Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.2503161Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.2503405Z stepcurrent: skipping 25 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_none_cuda
2025-12-04T13:38:32.2503451Z Running 1 items in this shard
2025-12-04T13:38:32.2503454Z 
2025-12-04T13:38:32.2503789Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_none_cuda I1204 13:34:05.889000 427263 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 427332
2025-12-04T13:38:32.2503955Z I1204 13:34:05.889000 427263 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 427333
2025-12-04T13:38:32.2504136Z I1204 13:34:05.890000 427263 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 427334
2025-12-04T13:38:32.2504302Z I1204 13:34:05.890000 427263 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 427335
2025-12-04T13:38:32.2504928Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2504972Z   _warn_cpu_init()
2025-12-04T13:38:32.2505584Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2505630Z   _warn_cpu_init()
2025-12-04T13:38:32.2506258Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2506298Z   _warn_cpu_init()
2025-12-04T13:38:32.2506912Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2506955Z   _warn_cpu_init()
2025-12-04T13:38:32.2507272Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.2507332Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2507486Z [rank0]:E1204 13:34:13.767000 427332 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2507675Z [rank0]:E1204 13:34:13.767000 427332 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2507986Z [rank0]:E1204 13:34:13.767000 427332 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2508155Z [rank0]:E1204 13:34:13.767000 427332 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2508461Z [rank0]:E1204 13:34:13.767000 427332 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2508599Z [rank0]:E1204 13:34:13.767000 427332 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2508899Z [rank0]:E1204 13:34:13.767000 427332 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2509072Z [rank0]:E1204 13:34:13.767000 427332 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
﻿2025-12-04T13:38:32.2510976Z [rank0]:E1204 13:34:13.767000 427332 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2511137Z [rank0]:E1204 13:34:13.767000 427332 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2511437Z [rank0]:E1204 13:34:13.767000 427332 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2511584Z [rank0]:E1204 13:34:13.767000 427332 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2511885Z [rank0]:E1204 13:34:13.767000 427332 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2512078Z [rank0]:E1204 13:34:13.767000 427332 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2512596Z [rank0]:E1204 13:34:13.767000 427332 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 20992 on device 0. CUDA driver allocated memory was 2453667840 and is now 3963617280.
2025-12-04T13:38:32.2512720Z [rank0]:E1204 13:34:13.767000 427332 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2512935Z [rank0]:E1204 13:34:13.767000 427332 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2513328Z [rank0]:E1204 13:34:13.767000 427332 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda
2025-12-04T13:38:32.2513453Z [rank0]:E1204 13:34:13.767000 427332 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2513697Z [rank0]:E1204 13:34:13.767000 427332 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2513873Z [rank0]:E1204 13:34:13.767000 427332 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.2513931Z dist init r=0, world=4
2025-12-04T13:38:32.2514076Z [rank2]:E1204 13:34:13.772000 427334 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2514248Z [rank2]:E1204 13:34:13.772000 427334 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2514558Z [rank2]:E1204 13:34:13.772000 427334 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2514724Z [rank2]:E1204 13:34:13.772000 427334 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2515033Z [rank2]:E1204 13:34:13.772000 427334 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2515168Z [rank2]:E1204 13:34:13.772000 427334 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2515470Z [rank2]:E1204 13:34:13.772000 427334 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2515701Z [rank2]:E1204 13:34:13.772000 427334 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2516000Z [rank2]:E1204 13:34:13.772000 427334 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2516160Z [rank2]:E1204 13:34:13.772000 427334 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2516460Z [rank2]:E1204 13:34:13.772000 427334 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2516609Z [rank2]:E1204 13:34:13.772000 427334 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2516930Z [rank2]:E1204 13:34:13.772000 427334 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2517090Z [rank2]:E1204 13:34:13.772000 427334 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2517597Z [rank2]:E1204 13:34:13.772000 427334 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 16896 on device 2. CUDA driver allocated memory was 2300575744 and is now 3810525184.
2025-12-04T13:38:32.2517720Z [rank2]:E1204 13:34:13.772000 427334 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2517933Z [rank2]:E1204 13:34:13.772000 427334 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2518330Z [rank2]:E1204 13:34:13.772000 427334 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda
2025-12-04T13:38:32.2518454Z [rank2]:E1204 13:34:13.772000 427334 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2518694Z [rank2]:E1204 13:34:13.772000 427334 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2518870Z [rank2]:E1204 13:34:13.772000 427334 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.2518912Z dist init r=2, world=4
2025-12-04T13:38:32.2519061Z [rank3]:E1204 13:34:13.787000 427335 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2519231Z [rank3]:E1204 13:34:13.787000 427335 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2519540Z [rank3]:E1204 13:34:13.787000 427335 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2519761Z [rank3]:E1204 13:34:13.787000 427335 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2520070Z [rank3]:E1204 13:34:13.787000 427335 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2520238Z [rank3]:E1204 13:34:13.787000 427335 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2520538Z [rank3]:E1204 13:34:13.787000 427335 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2520697Z [rank3]:E1204 13:34:13.787000 427335 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2520995Z [rank3]:E1204 13:34:13.787000 427335 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2521155Z [rank3]:E1204 13:34:13.787000 427335 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2521467Z [rank3]:E1204 13:34:13.787000 427335 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2521616Z [rank3]:E1204 13:34:13.787000 427335 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2521916Z [rank3]:E1204 13:34:13.787000 427335 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2522075Z [rank3]:E1204 13:34:13.787000 427335 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2522589Z [rank3]:E1204 13:34:13.787000 427335 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 16896 on device 3. CUDA driver allocated memory was 2250244096 and is now 3760193536.
2025-12-04T13:38:32.2522713Z [rank3]:E1204 13:34:13.787000 427335 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2522940Z [rank3]:E1204 13:34:13.787000 427335 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2523327Z [rank3]:E1204 13:34:13.787000 427335 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda
2025-12-04T13:38:32.2523463Z [rank3]:E1204 13:34:13.787000 427335 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2523693Z [rank3]:E1204 13:34:13.787000 427335 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2523869Z [rank3]:E1204 13:34:13.787000 427335 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.2523911Z dist init r=3, world=4
2025-12-04T13:38:32.2524057Z [rank1]:E1204 13:34:13.796000 427333 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2524231Z [rank1]:E1204 13:34:13.796000 427333 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2524540Z [rank1]:E1204 13:34:13.796000 427333 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2524708Z [rank1]:E1204 13:34:13.796000 427333 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2525028Z [rank1]:E1204 13:34:13.796000 427333 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2525162Z [rank1]:E1204 13:34:13.796000 427333 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2525460Z [rank1]:E1204 13:34:13.796000 427333 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2525620Z [rank1]:E1204 13:34:13.796000 427333 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2525928Z [rank1]:E1204 13:34:13.796000 427333 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2526087Z [rank1]:E1204 13:34:13.796000 427333 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2526385Z [rank1]:E1204 13:34:13.796000 427333 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2526531Z [rank1]:E1204 13:34:13.796000 427333 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2526833Z [rank1]:E1204 13:34:13.796000 427333 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2526995Z [rank1]:E1204 13:34:13.796000 427333 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2527521Z [rank1]:E1204 13:34:13.796000 427333 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 16896 on device 1. CUDA driver allocated memory was 2317352960 and is now 3827302400.
2025-12-04T13:38:32.2527644Z [rank1]:E1204 13:34:13.796000 427333 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2527865Z [rank1]:E1204 13:34:13.796000 427333 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2528250Z [rank1]:E1204 13:34:13.796000 427333 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda
2025-12-04T13:38:32.2528371Z [rank1]:E1204 13:34:13.796000 427333 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2528601Z [rank1]:E1204 13:34:13.796000 427333 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2528779Z [rank1]:E1204 13:34:13.796000 427333 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.2528820Z dist init r=1, world=4
2025-12-04T13:38:32.2529181Z [rank0]:[W1204 13:34:14.033471163 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.2529222Z FAILED [9.9195s] [100%]
2025-12-04T13:38:32.2529240Z 
2025-12-04T13:38:32.2529303Z =================================== FAILURES ===================================
2025-12-04T13:38:32.2529410Z ____ TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda ____
2025-12-04T13:38:32.2529461Z Traceback (most recent call last):
2025-12-04T13:38:32.2529691Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.2529739Z     self._join_processes(fn)
2025-12-04T13:38:32.2529926Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.2529988Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.2530182Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.2530229Z     raise RuntimeError(error)
2025-12-04T13:38:32.2530316Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.2530381Z Traceback (most recent call last):
2025-12-04T13:38:32.2530555Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2530601Z     getattr(self, test_name)()
2025-12-04T13:38:32.2530774Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2530811Z     fn()
2025-12-04T13:38:32.2530976Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2531020Z     method(*args, **kwargs)
2025-12-04T13:38:32.2531185Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2531228Z     method(*args, **kwargs)
2025-12-04T13:38:32.2531393Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2531433Z     with policy():
2025-12-04T13:38:32.2531601Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2531643Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2532032Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 20992 on device 0. CUDA driver allocated memory was 2453667840 and is now 3963617280.
2025-12-04T13:38:32.2532049Z 
2025-12-04T13:38:32.2532128Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2532374Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda
2025-12-04T13:38:32.2532378Z 
2025-12-04T13:38:32.2532470Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2532474Z 
2025-12-04T13:38:32.2532476Z 
2025-12-04T13:38:32.2532556Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.2532651Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.2532905Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-dfa5b0c3a6ae8df1.xml -
2025-12-04T13:38:32.2532975Z =========================== short test summary info ============================
2025-12-04T13:38:32.2533240Z FAILED [9.9195s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_none_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.2533291Z Traceback (most recent call last):
2025-12-04T13:38:32.2533486Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2533534Z     getattr(self, test_name)()
2025-12-04T13:38:32.2533704Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2533740Z     fn()
2025-12-04T13:38:32.2533904Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2533948Z     method(*args, **kwargs)
2025-12-04T13:38:32.2534113Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2534158Z     method(*args, **kwargs)
2025-12-04T13:38:32.2534320Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2534360Z     with policy():
2025-12-04T13:38:32.2534536Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2534581Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2534955Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 20992 on device 0. CUDA driver allocated memory was 2453667840 and is now 3963617280.
2025-12-04T13:38:32.2534958Z 
2025-12-04T13:38:32.2535038Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2535286Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda
2025-12-04T13:38:32.2535289Z 
2025-12-04T13:38:32.2535382Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2535451Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.2535518Z ====================== 1 failed, 32 deselected in 10.08s =======================
2025-12-04T13:38:32.2535558Z Got exit code 1
2025-12-04T13:38:32.2535601Z Retrying single test...
2025-12-04T13:38:32.2535819Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-ee85649eed6ed9a8.xml
2025-12-04T13:38:32.2535882Z ============================= test session starts ==============================
2025-12-04T13:38:32.2536006Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.2536069Z cachedir: .pytest_cache
2025-12-04T13:38:32.2536238Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.2536288Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.2536332Z configfile: pytest.ini
2025-12-04T13:38:32.2536510Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.2536588Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.2536829Z stepcurrent: skipping 25 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_none_cuda
2025-12-04T13:38:32.2536876Z Running 1 items in this shard
2025-12-04T13:38:32.2536878Z 
2025-12-04T13:38:32.2537208Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_none_cuda I1204 13:34:18.472000 427665 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 427734
2025-12-04T13:38:32.2537375Z I1204 13:34:18.473000 427665 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 427735
2025-12-04T13:38:32.2537540Z I1204 13:34:18.473000 427665 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 427736
2025-12-04T13:38:32.2537713Z I1204 13:34:18.474000 427665 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 427737
2025-12-04T13:38:32.2538344Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2538384Z   _warn_cpu_init()
2025-12-04T13:38:32.2539006Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2539050Z   _warn_cpu_init()
2025-12-04T13:38:32.2539714Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2539756Z   _warn_cpu_init()
2025-12-04T13:38:32.2540073Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.2540118Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2540747Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2540800Z   _warn_cpu_init()
2025-12-04T13:38:32.2540956Z [rank3]:E1204 13:34:26.322000 427737 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2541129Z [rank3]:E1204 13:34:26.322000 427737 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2541444Z [rank3]:E1204 13:34:26.322000 427737 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2541614Z [rank3]:E1204 13:34:26.322000 427737 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2541918Z [rank3]:E1204 13:34:26.322000 427737 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2542055Z [rank3]:E1204 13:34:26.322000 427737 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2542353Z [rank3]:E1204 13:34:26.322000 427737 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2542529Z [rank3]:E1204 13:34:26.322000 427737 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2542827Z [rank3]:E1204 13:34:26.322000 427737 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2542987Z [rank3]:E1204 13:34:26.322000 427737 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2543290Z [rank3]:E1204 13:34:26.322000 427737 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2543439Z [rank3]:E1204 13:34:26.322000 427737 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2543757Z [rank3]:E1204 13:34:26.322000 427737 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2543916Z [rank3]:E1204 13:34:26.322000 427737 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2544427Z [rank3]:E1204 13:34:26.322000 427737 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 20992 on device 3. CUDA driver allocated memory was 2250244096 and is now 3760193536.
2025-12-04T13:38:32.2544552Z [rank3]:E1204 13:34:26.322000 427737 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2544764Z [rank3]:E1204 13:34:26.322000 427737 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2545158Z [rank3]:E1204 13:34:26.322000 427737 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda
2025-12-04T13:38:32.2545280Z [rank3]:E1204 13:34:26.322000 427737 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2545523Z [rank3]:E1204 13:34:26.322000 427737 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2545699Z [rank3]:E1204 13:34:26.322000 427737 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.2545744Z dist init r=3, world=4
2025-12-04T13:38:32.2545892Z [rank2]:E1204 13:34:26.332000 427736 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2546062Z [rank2]:E1204 13:34:26.332000 427736 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2546371Z [rank2]:E1204 13:34:26.332000 427736 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2546539Z [rank2]:E1204 13:34:26.332000 427736 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2546848Z [rank2]:E1204 13:34:26.332000 427736 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2546996Z [rank2]:E1204 13:34:26.332000 427736 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2547292Z [rank2]:E1204 13:34:26.332000 427736 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2547451Z [rank2]:E1204 13:34:26.332000 427736 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2547751Z [rank2]:E1204 13:34:26.332000 427736 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2547908Z [rank2]:E1204 13:34:26.332000 427736 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2548217Z [rank2]:E1204 13:34:26.332000 427736 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2548363Z [rank2]:E1204 13:34:26.332000 427736 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2548663Z [rank2]:E1204 13:34:26.332000 427736 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2548823Z [rank2]:E1204 13:34:26.332000 427736 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2549334Z [rank2]:E1204 13:34:26.332000 427736 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 20992 on device 2. CUDA driver allocated memory was 2300575744 and is now 3810525184.
2025-12-04T13:38:32.2549459Z [rank2]:E1204 13:34:26.332000 427736 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2549738Z [rank2]:E1204 13:34:26.332000 427736 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2550117Z [rank2]:E1204 13:34:26.332000 427736 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda
2025-12-04T13:38:32.2550255Z [rank2]:E1204 13:34:26.332000 427736 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2550481Z [rank2]:E1204 13:34:26.332000 427736 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2550660Z [rank2]:E1204 13:34:26.332000 427736 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.2550700Z dist init r=2, world=4
2025-12-04T13:38:32.2550851Z [rank0]:E1204 13:34:26.376000 427734 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2551022Z [rank0]:E1204 13:34:26.376000 427734 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2551331Z [rank0]:E1204 13:34:26.376000 427734 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2551496Z [rank0]:E1204 13:34:26.376000 427734 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2551820Z [rank0]:E1204 13:34:26.376000 427734 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2551954Z [rank0]:E1204 13:34:26.376000 427734 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2552251Z [rank0]:E1204 13:34:26.376000 427734 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2552412Z [rank0]:E1204 13:34:26.376000 427734 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2552706Z [rank0]:E1204 13:34:26.376000 427734 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2552878Z [rank0]:E1204 13:34:26.376000 427734 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2553179Z [rank0]:E1204 13:34:26.376000 427734 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2553328Z [rank0]:E1204 13:34:26.376000 427734 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2553631Z [rank0]:E1204 13:34:26.376000 427734 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2553788Z [rank0]:E1204 13:34:26.376000 427734 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2554323Z [rank0]:E1204 13:34:26.376000 427734 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 20992 on device 0. CUDA driver allocated memory was 2453667840 and is now 3963617280.
2025-12-04T13:38:32.2554444Z [rank0]:E1204 13:34:26.376000 427734 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2554667Z [rank0]:E1204 13:34:26.376000 427734 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2555046Z [rank0]:E1204 13:34:26.376000 427734 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda
2025-12-04T13:38:32.2555173Z [rank0]:E1204 13:34:26.376000 427734 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2555401Z [rank0]:E1204 13:34:26.376000 427734 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2555580Z [rank0]:E1204 13:34:26.376000 427734 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.2555623Z dist init r=0, world=4
2025-12-04T13:38:32.2555772Z [rank1]:E1204 13:34:26.380000 427735 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2555943Z [rank1]:E1204 13:34:26.380000 427735 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2556255Z [rank1]:E1204 13:34:26.380000 427735 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2556435Z [rank1]:E1204 13:34:26.380000 427735 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2556741Z [rank1]:E1204 13:34:26.380000 427735 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2556875Z [rank1]:E1204 13:34:26.380000 427735 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2557174Z [rank1]:E1204 13:34:26.380000 427735 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2557344Z [rank1]:E1204 13:34:26.380000 427735 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2557643Z [rank1]:E1204 13:34:26.380000 427735 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2557799Z [rank1]:E1204 13:34:26.380000 427735 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2558097Z [rank1]:E1204 13:34:26.380000 427735 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2558244Z [rank1]:E1204 13:34:26.380000 427735 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2558543Z [rank1]:E1204 13:34:26.380000 427735 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2558702Z [rank1]:E1204 13:34:26.380000 427735 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2559223Z [rank1]:E1204 13:34:26.380000 427735 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 25088 on device 1. CUDA driver allocated memory was 2317352960 and is now 3827302400.
2025-12-04T13:38:32.2559358Z [rank1]:E1204 13:34:26.380000 427735 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2559614Z [rank1]:E1204 13:34:26.380000 427735 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2559998Z [rank1]:E1204 13:34:26.380000 427735 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda
2025-12-04T13:38:32.2560118Z [rank1]:E1204 13:34:26.380000 427735 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2560347Z [rank1]:E1204 13:34:26.380000 427735 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2560524Z [rank1]:E1204 13:34:26.380000 427735 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.2560564Z dist init r=1, world=4
2025-12-04T13:38:32.2560929Z [rank0]:[W1204 13:34:26.684230663 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.2560985Z FAILED [9.7212s] [100%]
2025-12-04T13:38:32.2560987Z 
2025-12-04T13:38:32.2561050Z =================================== FAILURES ===================================
2025-12-04T13:38:32.2561156Z ____ TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda ____
2025-12-04T13:38:32.2561206Z Traceback (most recent call last):
2025-12-04T13:38:32.2561383Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.2561432Z     self._join_processes(fn)
2025-12-04T13:38:32.2561619Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.2561679Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.2561886Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.2561935Z     raise RuntimeError(error)
2025-12-04T13:38:32.2562019Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.2562068Z Traceback (most recent call last):
2025-12-04T13:38:32.2562243Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2562292Z     getattr(self, test_name)()
2025-12-04T13:38:32.2562465Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2562500Z     fn()
2025-12-04T13:38:32.2562665Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2562710Z     method(*args, **kwargs)
2025-12-04T13:38:32.2562874Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2562917Z     method(*args, **kwargs)
2025-12-04T13:38:32.2563081Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2563135Z     with policy():
2025-12-04T13:38:32.2563308Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2563367Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2563744Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 20992 on device 3. CUDA driver allocated memory was 2250244096 and is now 3760193536.
2025-12-04T13:38:32.2563746Z 
2025-12-04T13:38:32.2563827Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2564076Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda
2025-12-04T13:38:32.2564079Z 
2025-12-04T13:38:32.2564173Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2564179Z 
2025-12-04T13:38:32.2564181Z 
2025-12-04T13:38:32.2564261Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.2564358Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.2564609Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-ee85649eed6ed9a8.xml -
2025-12-04T13:38:32.2564675Z =========================== short test summary info ============================
2025-12-04T13:38:32.2564954Z FAILED [9.7212s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_none_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.2565003Z Traceback (most recent call last):
2025-12-04T13:38:32.2565180Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2565228Z     getattr(self, test_name)()
2025-12-04T13:38:32.2565398Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2565437Z     fn()
2025-12-04T13:38:32.2565601Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2565645Z     method(*args, **kwargs)
2025-12-04T13:38:32.2565808Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2565852Z     method(*args, **kwargs)
2025-12-04T13:38:32.2566026Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2566068Z     with policy():
2025-12-04T13:38:32.2566232Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2566277Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2566654Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 20992 on device 3. CUDA driver allocated memory was 2250244096 and is now 3760193536.
2025-12-04T13:38:32.2566658Z 
2025-12-04T13:38:32.2566738Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2566985Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_none_cuda
2025-12-04T13:38:32.2566989Z 
2025-12-04T13:38:32.2567081Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2567151Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.2567226Z ======================= 1 failed, 32 deselected in 9.88s =======================
2025-12-04T13:38:32.2567267Z Got exit code 1
2025-12-04T13:38:32.2567459Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_none_cuda
2025-12-04T13:38:32.2567611Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T13:38:32.2567813Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-e520f518f8fb1ae8.xml
2025-12-04T13:38:32.2567878Z ============================= test session starts ==============================
2025-12-04T13:38:32.2568004Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.2568049Z cachedir: .pytest_cache
2025-12-04T13:38:32.2568222Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.2568271Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.2568315Z configfile: pytest.ini
2025-12-04T13:38:32.2568489Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.2568570Z collecting ... collected 60 items / 26 deselected / 34 selected
2025-12-04T13:38:32.2568627Z stepcurrent: skipping 26 already run items.
2025-12-04T13:38:32.2568673Z Running 7 items in this shard
2025-12-04T13:38:32.2568676Z 
2025-12-04T13:38:32.2569019Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_shard_grad_op_cuda I1204 13:34:30.792000 428067 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 428136
2025-12-04T13:38:32.2569198Z I1204 13:34:30.792000 428067 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 428137
2025-12-04T13:38:32.2569365Z I1204 13:34:30.793000 428067 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 428138
2025-12-04T13:38:32.2569529Z I1204 13:34:30.793000 428067 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 428139
2025-12-04T13:38:32.2570222Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2570265Z   _warn_cpu_init()
2025-12-04T13:38:32.2570882Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2570922Z   _warn_cpu_init()
2025-12-04T13:38:32.2571536Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2571577Z   _warn_cpu_init()
2025-12-04T13:38:32.2571904Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.2571952Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2572575Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2572617Z   _warn_cpu_init()
2025-12-04T13:38:32.2572770Z [rank2]:E1204 13:34:38.681000 428138 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2572945Z [rank2]:E1204 13:34:38.681000 428138 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2573260Z [rank2]:E1204 13:34:38.681000 428138 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2573430Z [rank2]:E1204 13:34:38.681000 428138 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2573737Z [rank2]:E1204 13:34:38.681000 428138 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2573888Z [rank2]:E1204 13:34:38.681000 428138 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2574193Z [rank2]:E1204 13:34:38.681000 428138 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2574350Z [rank2]:E1204 13:34:38.681000 428138 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2574650Z [rank2]:E1204 13:34:38.681000 428138 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2574807Z [rank2]:E1204 13:34:38.681000 428138 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2575120Z [rank2]:E1204 13:34:38.681000 428138 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2575269Z [rank2]:E1204 13:34:38.681000 428138 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2575568Z [rank2]:E1204 13:34:38.681000 428138 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2575729Z [rank2]:E1204 13:34:38.681000 428138 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2576256Z [rank2]:E1204 13:34:38.681000 428138 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 16896 on device 2. CUDA driver allocated memory was 2300575744 and is now 3810525184.
2025-12-04T13:38:32.2576382Z [rank2]:E1204 13:34:38.681000 428138 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2576603Z [rank2]:E1204 13:34:38.681000 428138 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2577010Z [rank2]:E1204 13:34:38.681000 428138 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2577133Z [rank2]:E1204 13:34:38.681000 428138 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2577360Z [rank2]:E1204 13:34:38.681000 428138 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2577536Z [rank2]:E1204 13:34:38.681000 428138 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.2577578Z dist init r=2, world=4
2025-12-04T13:38:32.2577725Z [rank3]:E1204 13:34:38.682000 428139 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2577896Z [rank3]:E1204 13:34:38.682000 428139 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2578206Z [rank3]:E1204 13:34:38.682000 428139 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2578386Z [rank3]:E1204 13:34:38.682000 428139 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2578693Z [rank3]:E1204 13:34:38.682000 428139 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2578827Z [rank3]:E1204 13:34:38.682000 428139 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2579123Z [rank3]:E1204 13:34:38.682000 428139 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2579282Z [rank3]:E1204 13:34:38.682000 428139 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2579637Z [rank3]:E1204 13:34:38.682000 428139 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2579798Z [rank3]:E1204 13:34:38.682000 428139 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2580093Z [rank3]:E1204 13:34:38.682000 428139 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2580239Z [rank3]:E1204 13:34:38.682000 428139 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2580537Z [rank3]:E1204 13:34:38.682000 428139 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2580697Z [rank3]:E1204 13:34:38.682000 428139 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2581241Z [rank3]:E1204 13:34:38.682000 428139 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 20992 on device 3. CUDA driver allocated memory was 2250244096 and is now 3760193536.
2025-12-04T13:38:32.2581377Z [rank3]:E1204 13:34:38.682000 428139 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2581588Z [rank3]:E1204 13:34:38.682000 428139 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2581980Z [rank3]:E1204 13:34:38.682000 428139 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2582104Z [rank3]:E1204 13:34:38.682000 428139 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2582330Z [rank3]:E1204 13:34:38.682000 428139 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2582505Z [rank3]:E1204 13:34:38.682000 428139 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.2582548Z dist init r=3, world=4
2025-12-04T13:38:32.2582694Z [rank1]:E1204 13:34:38.731000 428137 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2582865Z [rank1]:E1204 13:34:38.731000 428137 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2583188Z [rank1]:E1204 13:34:38.731000 428137 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2583355Z [rank1]:E1204 13:34:38.731000 428137 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2583660Z [rank1]:E1204 13:34:38.731000 428137 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2583793Z [rank1]:E1204 13:34:38.731000 428137 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2584103Z [rank1]:E1204 13:34:38.731000 428137 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2584261Z [rank1]:E1204 13:34:38.731000 428137 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2584557Z [rank1]:E1204 13:34:38.731000 428137 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2584714Z [rank1]:E1204 13:34:38.731000 428137 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2585013Z [rank1]:E1204 13:34:38.731000 428137 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2585156Z [rank1]:E1204 13:34:38.731000 428137 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2585456Z [rank1]:E1204 13:34:38.731000 428137 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2585628Z [rank1]:E1204 13:34:38.731000 428137 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2586153Z [rank1]:E1204 13:34:38.731000 428137 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 16896 on device 1. CUDA driver allocated memory was 2317352960 and is now 3827302400.
2025-12-04T13:38:32.2586287Z [rank1]:E1204 13:34:38.731000 428137 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2586498Z [rank1]:E1204 13:34:38.731000 428137 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2586892Z [rank1]:E1204 13:34:38.731000 428137 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2587013Z [rank1]:E1204 13:34:38.731000 428137 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2587238Z [rank1]:E1204 13:34:38.731000 428137 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2587414Z [rank1]:E1204 13:34:38.731000 428137 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.2587478Z dist init r=1, world=4
2025-12-04T13:38:32.2587625Z [rank0]:E1204 13:34:38.739000 428136 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2587794Z [rank0]:E1204 13:34:38.739000 428136 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2588104Z [rank0]:E1204 13:34:38.739000 428136 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2588269Z [rank0]:E1204 13:34:38.739000 428136 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2588579Z [rank0]:E1204 13:34:38.739000 428136 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2588724Z [rank0]:E1204 13:34:38.739000 428136 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2589023Z [rank0]:E1204 13:34:38.739000 428136 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2589180Z [rank0]:E1204 13:34:38.739000 428136 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2589478Z [rank0]:E1204 13:34:38.739000 428136 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2589685Z [rank0]:E1204 13:34:38.739000 428136 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2589983Z [rank0]:E1204 13:34:38.739000 428136 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2590129Z [rank0]:E1204 13:34:38.739000 428136 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2590439Z [rank0]:E1204 13:34:38.739000 428136 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2590611Z [rank0]:E1204 13:34:38.739000 428136 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2591136Z [rank0]:E1204 13:34:38.739000 428136 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 16896 on device 0. CUDA driver allocated memory was 2453667840 and is now 3963617280.
2025-12-04T13:38:32.2591259Z [rank0]:E1204 13:34:38.739000 428136 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2591469Z [rank0]:E1204 13:34:38.739000 428136 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2591860Z [rank0]:E1204 13:34:38.739000 428136 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2591983Z [rank0]:E1204 13:34:38.739000 428136 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2592225Z [rank0]:E1204 13:34:38.739000 428136 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2592400Z [rank0]:E1204 13:34:38.739000 428136 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.2592441Z dist init r=0, world=4
2025-12-04T13:38:32.2592800Z [rank0]:[W1204 13:34:39.071627987 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.2592843Z FAILED [9.8194s] [ 14%]
2025-12-04T13:38:32.2592845Z 
2025-12-04T13:38:32.2592903Z =================================== FAILURES ===================================
2025-12-04T13:38:32.2593018Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda _
2025-12-04T13:38:32.2593068Z Traceback (most recent call last):
2025-12-04T13:38:32.2593260Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.2593305Z     self._join_processes(fn)
2025-12-04T13:38:32.2593494Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.2593551Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.2593742Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.2593789Z     raise RuntimeError(error)
2025-12-04T13:38:32.2593874Z RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T13:38:32.2593922Z Traceback (most recent call last):
2025-12-04T13:38:32.2594095Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2594141Z     getattr(self, test_name)()
2025-12-04T13:38:32.2594313Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2594350Z     fn()
2025-12-04T13:38:32.2594522Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2594566Z     method(*args, **kwargs)
2025-12-04T13:38:32.2594727Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2594783Z     method(*args, **kwargs)
2025-12-04T13:38:32.2594945Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2594984Z     with policy():
2025-12-04T13:38:32.2595147Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2595193Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2595581Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 16896 on device 2. CUDA driver allocated memory was 2300575744 and is now 3810525184.
2025-12-04T13:38:32.2595584Z 
2025-12-04T13:38:32.2595666Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2595927Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2595932Z 
2025-12-04T13:38:32.2596025Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2596028Z 
2025-12-04T13:38:32.2596091Z Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.2596152Z Traceback (most recent call last):
2025-12-04T13:38:32.2596329Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2596373Z     getattr(self, test_name)()
2025-12-04T13:38:32.2596547Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2596583Z     fn()
2025-12-04T13:38:32.2596746Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2596789Z     method(*args, **kwargs)
2025-12-04T13:38:32.2596951Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2596992Z     method(*args, **kwargs)
2025-12-04T13:38:32.2597155Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2597194Z     with policy():
2025-12-04T13:38:32.2597369Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2597412Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2597798Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 20992 on device 3. CUDA driver allocated memory was 2250244096 and is now 3760193536.
2025-12-04T13:38:32.2597801Z 
2025-12-04T13:38:32.2597879Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2598135Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2598138Z 
2025-12-04T13:38:32.2598231Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2598234Z 
2025-12-04T13:38:32.2598237Z 
2025-12-04T13:38:32.2598316Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.2598412Z Process 2 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.2598673Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-e520f518f8fb1ae8.xml -
2025-12-04T13:38:32.2598738Z =========================== short test summary info ============================
2025-12-04T13:38:32.2599023Z FAILED [9.8194s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_shard_grad_op_cuda - RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T13:38:32.2599072Z Traceback (most recent call last):
2025-12-04T13:38:32.2599248Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2599295Z     getattr(self, test_name)()
2025-12-04T13:38:32.2599467Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2599504Z     fn()
2025-12-04T13:38:32.2599720Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2599765Z     method(*args, **kwargs)
2025-12-04T13:38:32.2599929Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2599972Z     method(*args, **kwargs)
2025-12-04T13:38:32.2600134Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2600172Z     with policy():
2025-12-04T13:38:32.2600335Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2600395Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2600783Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 16896 on device 2. CUDA driver allocated memory was 2300575744 and is now 3810525184.
2025-12-04T13:38:32.2600785Z 
2025-12-04T13:38:32.2600864Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2601123Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2601125Z 
2025-12-04T13:38:32.2601216Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2601218Z 
2025-12-04T13:38:32.2601282Z Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.2601329Z Traceback (most recent call last):
2025-12-04T13:38:32.2601518Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2601563Z     getattr(self, test_name)()
2025-12-04T13:38:32.2601736Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2601772Z     fn()
2025-12-04T13:38:32.2601934Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2601978Z     method(*args, **kwargs)
2025-12-04T13:38:32.2602139Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2602181Z     method(*args, **kwargs)
2025-12-04T13:38:32.2602342Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2602384Z     with policy():
2025-12-04T13:38:32.2602547Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2602591Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2602990Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 20992 on device 3. CUDA driver allocated memory was 2250244096 and is now 3760193536.
2025-12-04T13:38:32.2603014Z 
2025-12-04T13:38:32.2603094Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2603350Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2603354Z 
2025-12-04T13:38:32.2603446Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2603516Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.2603581Z ======================= 1 failed, 26 deselected in 9.96s =======================
2025-12-04T13:38:32.2603621Z Got exit code 1
2025-12-04T13:38:32.2603663Z Retrying single test...
2025-12-04T13:38:32.2603865Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-71107c8e73731f92.xml
2025-12-04T13:38:32.2603927Z ============================= test session starts ==============================
2025-12-04T13:38:32.2604048Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.2604091Z cachedir: .pytest_cache
2025-12-04T13:38:32.2604260Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.2604323Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.2604367Z configfile: pytest.ini
2025-12-04T13:38:32.2604542Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.2604621Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.2604871Z stepcurrent: skipping 26 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2604918Z Running 1 items in this shard
2025-12-04T13:38:32.2604921Z 
2025-12-04T13:38:32.2605258Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_shard_grad_op_cuda I1204 13:34:43.303000 428469 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 428538
2025-12-04T13:38:32.2605425Z I1204 13:34:43.304000 428469 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 428539
2025-12-04T13:38:32.2605604Z I1204 13:34:43.304000 428469 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 428540
2025-12-04T13:38:32.2605770Z I1204 13:34:43.305000 428469 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 428541
2025-12-04T13:38:32.2606398Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2606438Z   _warn_cpu_init()
2025-12-04T13:38:32.2607063Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2607105Z   _warn_cpu_init()
2025-12-04T13:38:32.2607716Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2607769Z   _warn_cpu_init()
2025-12-04T13:38:32.2608384Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2608423Z   _warn_cpu_init()
2025-12-04T13:38:32.2608741Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.2608788Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2608942Z [rank0]:E1204 13:34:51.093000 428538 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2609128Z [rank0]:E1204 13:34:51.093000 428538 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2609441Z [rank0]:E1204 13:34:51.093000 428538 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2609650Z [rank0]:E1204 13:34:51.093000 428538 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2609960Z [rank0]:E1204 13:34:51.093000 428538 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2610093Z [rank0]:E1204 13:34:51.093000 428538 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2610405Z [rank0]:E1204 13:34:51.093000 428538 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2610563Z [rank0]:E1204 13:34:51.093000 428538 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2610863Z [rank0]:E1204 13:34:51.093000 428538 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2611023Z [rank0]:E1204 13:34:51.093000 428538 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2611319Z [rank0]:E1204 13:34:51.093000 428538 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2611469Z [rank0]:E1204 13:34:51.093000 428538 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2611780Z [rank0]:E1204 13:34:51.093000 428538 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2611939Z [rank0]:E1204 13:34:51.093000 428538 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2612477Z [rank0]:E1204 13:34:51.093000 428538 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 2453667840 and is now 3963617280.
2025-12-04T13:38:32.2612601Z [rank0]:E1204 13:34:51.093000 428538 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2612813Z [rank0]:E1204 13:34:51.093000 428538 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2613206Z [rank0]:E1204 13:34:51.093000 428538 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2613330Z [rank0]:E1204 13:34:51.093000 428538 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2613558Z [rank0]:E1204 13:34:51.093000 428538 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2613751Z [rank0]:E1204 13:34:51.093000 428538 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.2613791Z dist init r=0, world=4
2025-12-04T13:38:32.2613940Z [rank2]:E1204 13:34:51.095000 428540 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2614114Z [rank2]:E1204 13:34:51.095000 428540 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2614423Z [rank2]:E1204 13:34:51.095000 428540 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2614589Z [rank2]:E1204 13:34:51.095000 428540 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2614907Z [rank2]:E1204 13:34:51.095000 428540 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2615046Z [rank2]:E1204 13:34:51.095000 428540 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2615345Z [rank2]:E1204 13:34:51.095000 428540 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2615504Z [rank2]:E1204 13:34:51.095000 428540 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2615803Z [rank2]:E1204 13:34:51.095000 428540 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2615963Z [rank2]:E1204 13:34:51.095000 428540 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2616260Z [rank2]:E1204 13:34:51.095000 428540 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2616417Z [rank2]:E1204 13:34:51.095000 428540 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2616716Z [rank2]:E1204 13:34:51.095000 428540 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2616886Z [rank2]:E1204 13:34:51.095000 428540 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2617407Z [rank2]:E1204 13:34:51.095000 428540 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 25088 on device 2. CUDA driver allocated memory was 2300575744 and is now 3810525184.
2025-12-04T13:38:32.2617532Z [rank2]:E1204 13:34:51.095000 428540 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2617740Z [rank2]:E1204 13:34:51.095000 428540 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2618138Z [rank2]:E1204 13:34:51.095000 428540 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2618271Z [rank2]:E1204 13:34:51.095000 428540 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2618503Z [rank2]:E1204 13:34:51.095000 428540 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2618683Z [rank2]:E1204 13:34:51.095000 428540 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.2618725Z dist init r=2, world=4
2025-12-04T13:38:32.2618870Z [rank3]:E1204 13:34:51.099000 428541 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2619043Z [rank3]:E1204 13:34:51.099000 428541 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2619354Z [rank3]:E1204 13:34:51.099000 428541 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2619530Z [rank3]:E1204 13:34:51.099000 428541 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2619882Z [rank3]:E1204 13:34:51.099000 428541 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2620013Z [rank3]:E1204 13:34:51.099000 428541 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2620315Z [rank3]:E1204 13:34:51.099000 428541 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2620472Z [rank3]:E1204 13:34:51.099000 428541 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2620773Z [rank3]:E1204 13:34:51.099000 428541 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2620951Z [rank3]:E1204 13:34:51.099000 428541 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2621246Z [rank3]:E1204 13:34:51.099000 428541 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2621411Z [rank3]:E1204 13:34:51.099000 428541 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2621710Z [rank3]:E1204 13:34:51.099000 428541 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2621871Z [rank3]:E1204 13:34:51.099000 428541 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2622392Z [rank3]:E1204 13:34:51.099000 428541 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 2250244096 and is now 3760193536.
2025-12-04T13:38:32.2622515Z [rank3]:E1204 13:34:51.099000 428541 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2622724Z [rank3]:E1204 13:34:51.099000 428541 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2623132Z [rank3]:E1204 13:34:51.099000 428541 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2623254Z [rank3]:E1204 13:34:51.099000 428541 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2623484Z [rank3]:E1204 13:34:51.099000 428541 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2623660Z [rank3]:E1204 13:34:51.099000 428541 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.2623700Z dist init r=3, world=4
2025-12-04T13:38:32.2623848Z [rank1]:E1204 13:34:51.180000 428539 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2624032Z [rank1]:E1204 13:34:51.180000 428539 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2624341Z [rank1]:E1204 13:34:51.180000 428539 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2624507Z [rank1]:E1204 13:34:51.180000 428539 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2624812Z [rank1]:E1204 13:34:51.180000 428539 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2624944Z [rank1]:E1204 13:34:51.180000 428539 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2625244Z [rank1]:E1204 13:34:51.180000 428539 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2625404Z [rank1]:E1204 13:34:51.180000 428539 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2625710Z [rank1]:E1204 13:34:51.180000 428539 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2625880Z [rank1]:E1204 13:34:51.180000 428539 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2626180Z [rank1]:E1204 13:34:51.180000 428539 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2626328Z [rank1]:E1204 13:34:51.180000 428539 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2626629Z [rank1]:E1204 13:34:51.180000 428539 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2626788Z [rank1]:E1204 13:34:51.180000 428539 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2627313Z [rank1]:E1204 13:34:51.180000 428539 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 16896 on device 1. CUDA driver allocated memory was 2317352960 and is now 3827302400.
2025-12-04T13:38:32.2627447Z [rank1]:E1204 13:34:51.180000 428539 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2627657Z [rank1]:E1204 13:34:51.180000 428539 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2628051Z [rank1]:E1204 13:34:51.180000 428539 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2628172Z [rank1]:E1204 13:34:51.180000 428539 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2628399Z [rank1]:E1204 13:34:51.180000 428539 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2628588Z [rank1]:E1204 13:34:51.180000 428539 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.2628630Z dist init r=1, world=4
2025-12-04T13:38:32.2628991Z [rank0]:[W1204 13:34:51.269833344 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.2629033Z FAILED [9.6209s] [100%]
2025-12-04T13:38:32.2629035Z 
2025-12-04T13:38:32.2629093Z =================================== FAILURES ===================================
2025-12-04T13:38:32.2629210Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda _
2025-12-04T13:38:32.2629259Z Traceback (most recent call last):
2025-12-04T13:38:32.2629433Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.2629481Z     self._join_processes(fn)
2025-12-04T13:38:32.2629711Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.2629770Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.2629973Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.2630020Z     raise RuntimeError(error)
2025-12-04T13:38:32.2630103Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.2630166Z Traceback (most recent call last):
2025-12-04T13:38:32.2630338Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2630383Z     getattr(self, test_name)()
2025-12-04T13:38:32.2630553Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2630591Z     fn()
2025-12-04T13:38:32.2630755Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2630798Z     method(*args, **kwargs)
2025-12-04T13:38:32.2630962Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2631006Z     method(*args, **kwargs)
2025-12-04T13:38:32.2631169Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2631210Z     with policy():
2025-12-04T13:38:32.2631374Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2631419Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2634704Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 2453667840 and is now 3963617280.
2025-12-04T13:38:32.2634737Z 
2025-12-04T13:38:32.2634827Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2635098Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2635100Z 
2025-12-04T13:38:32.2635197Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2635200Z 
2025-12-04T13:38:32.2635202Z 
2025-12-04T13:38:32.2635285Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.2635382Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.2635652Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-71107c8e73731f92.xml -
2025-12-04T13:38:32.2635718Z =========================== short test summary info ============================
2025-12-04T13:38:32.2635999Z FAILED [9.6209s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_shard_grad_op_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.2636047Z Traceback (most recent call last):
2025-12-04T13:38:32.2636226Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2636272Z     getattr(self, test_name)()
2025-12-04T13:38:32.2636449Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2636486Z     fn()
2025-12-04T13:38:32.2636651Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2636695Z     method(*args, **kwargs)
2025-12-04T13:38:32.2636858Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2636900Z     method(*args, **kwargs)
2025-12-04T13:38:32.2637076Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2637115Z     with policy():
2025-12-04T13:38:32.2637282Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2637337Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2637726Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 2453667840 and is now 3963617280.
2025-12-04T13:38:32.2637730Z 
2025-12-04T13:38:32.2637811Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2638074Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2638076Z 
2025-12-04T13:38:32.2638170Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2638238Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.2638306Z ======================= 1 failed, 32 deselected in 9.79s =======================
2025-12-04T13:38:32.2638345Z Got exit code 1
2025-12-04T13:38:32.2638388Z Retrying single test...
2025-12-04T13:38:32.2638593Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-60946723aa831dfe.xml
2025-12-04T13:38:32.2638669Z ============================= test session starts ==============================
2025-12-04T13:38:32.2638793Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.2638837Z cachedir: .pytest_cache
2025-12-04T13:38:32.2639011Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.2639061Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.2639103Z configfile: pytest.ini
2025-12-04T13:38:32.2639282Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.2639361Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.2639659Z stepcurrent: skipping 26 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2639707Z Running 1 items in this shard
2025-12-04T13:38:32.2639709Z 
2025-12-04T13:38:32.2640071Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_shard_grad_op_cuda I1204 13:34:55.621000 428871 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 428940
2025-12-04T13:38:32.2640240Z I1204 13:34:55.622000 428871 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 428941
2025-12-04T13:38:32.2640402Z I1204 13:34:55.622000 428871 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 428942
2025-12-04T13:38:32.2640564Z I1204 13:34:55.623000 428871 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 428943
2025-12-04T13:38:32.2641196Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2641238Z   _warn_cpu_init()
2025-12-04T13:38:32.2641872Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2641925Z   _warn_cpu_init()
2025-12-04T13:38:32.2642243Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.2642288Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2642906Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2642945Z   _warn_cpu_init()
2025-12-04T13:38:32.2643556Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2643610Z   _warn_cpu_init()
2025-12-04T13:38:32.2643766Z [rank3]:E1204 13:35:03.388000 428943 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2643940Z [rank3]:E1204 13:35:03.388000 428943 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2644254Z [rank3]:E1204 13:35:03.388000 428943 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2644420Z [rank3]:E1204 13:35:03.388000 428943 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2644737Z [rank3]:E1204 13:35:03.388000 428943 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2644873Z [rank3]:E1204 13:35:03.388000 428943 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2645177Z [rank3]:E1204 13:35:03.388000 428943 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2645337Z [rank3]:E1204 13:35:03.388000 428943 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2645636Z [rank3]:E1204 13:35:03.388000 428943 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2645795Z [rank3]:E1204 13:35:03.388000 428943 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2646103Z [rank3]:E1204 13:35:03.388000 428943 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2646250Z [rank3]:E1204 13:35:03.388000 428943 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2646568Z [rank3]:E1204 13:35:03.388000 428943 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2646727Z [rank3]:E1204 13:35:03.388000 428943 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2647254Z [rank3]:E1204 13:35:03.388000 428943 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 25088 on device 3. CUDA driver allocated memory was 2250244096 and is now 3760193536.
2025-12-04T13:38:32.2647379Z [rank3]:E1204 13:35:03.388000 428943 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2647589Z [rank3]:E1204 13:35:03.388000 428943 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2647984Z [rank3]:E1204 13:35:03.388000 428943 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2648120Z [rank3]:E1204 13:35:03.388000 428943 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2648347Z [rank3]:E1204 13:35:03.388000 428943 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2648526Z [rank3]:E1204 13:35:03.388000 428943 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.2648567Z dist init r=3, world=4
2025-12-04T13:38:32.2648716Z [rank0]:E1204 13:35:03.395000 428940 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2648887Z [rank0]:E1204 13:35:03.395000 428940 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2649211Z [rank0]:E1204 13:35:03.395000 428940 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2649375Z [rank0]:E1204 13:35:03.395000 428940 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2649729Z [rank0]:E1204 13:35:03.395000 428940 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2649862Z [rank0]:E1204 13:35:03.395000 428940 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2650160Z [rank0]:E1204 13:35:03.395000 428940 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2650320Z [rank0]:E1204 13:35:03.395000 428940 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2650616Z [rank0]:E1204 13:35:03.395000 428940 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2650789Z [rank0]:E1204 13:35:03.395000 428940 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2651083Z [rank0]:E1204 13:35:03.395000 428940 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2651244Z [rank0]:E1204 13:35:03.395000 428940 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2651546Z [rank0]:E1204 13:35:03.395000 428940 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2651707Z [rank0]:E1204 13:35:03.395000 428940 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2652232Z [rank0]:E1204 13:35:03.395000 428940 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 16896 on device 0. CUDA driver allocated memory was 2453667840 and is now 3963617280.
2025-12-04T13:38:32.2652354Z [rank0]:E1204 13:35:03.395000 428940 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2652564Z [rank0]:E1204 13:35:03.395000 428940 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2652971Z [rank0]:E1204 13:35:03.395000 428940 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2653094Z [rank0]:E1204 13:35:03.395000 428940 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2653321Z [rank0]:E1204 13:35:03.395000 428940 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2653496Z [rank0]:E1204 13:35:03.395000 428940 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.2653537Z dist init r=0, world=4
2025-12-04T13:38:32.2653683Z [rank2]:E1204 13:35:03.446000 428942 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2653868Z [rank2]:E1204 13:35:03.446000 428942 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2654178Z [rank2]:E1204 13:35:03.446000 428942 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2654342Z [rank2]:E1204 13:35:03.446000 428942 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2654647Z [rank2]:E1204 13:35:03.446000 428942 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2654778Z [rank2]:E1204 13:35:03.446000 428942 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2655077Z [rank2]:E1204 13:35:03.446000 428942 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2655245Z [rank2]:E1204 13:35:03.446000 428942 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2655544Z [rank2]:E1204 13:35:03.446000 428942 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2655711Z [rank2]:E1204 13:35:03.446000 428942 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2656010Z [rank2]:E1204 13:35:03.446000 428942 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2656156Z [rank2]:E1204 13:35:03.446000 428942 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2656458Z [rank2]:E1204 13:35:03.446000 428942 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2656619Z [rank2]:E1204 13:35:03.446000 428942 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2657144Z [rank2]:E1204 13:35:03.446000 428942 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 20992 on device 2. CUDA driver allocated memory was 2300575744 and is now 3810525184.
2025-12-04T13:38:32.2657285Z [rank2]:E1204 13:35:03.446000 428942 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2657492Z [rank2]:E1204 13:35:03.446000 428942 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2657886Z [rank2]:E1204 13:35:03.446000 428942 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2658008Z [rank2]:E1204 13:35:03.446000 428942 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2658235Z [rank2]:E1204 13:35:03.446000 428942 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2658424Z [rank2]:E1204 13:35:03.446000 428942 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.2658464Z dist init r=2, world=4
2025-12-04T13:38:32.2658611Z [rank1]:E1204 13:35:03.448000 428941 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2658783Z [rank1]:E1204 13:35:03.448000 428941 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2659093Z [rank1]:E1204 13:35:03.448000 428941 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2659257Z [rank1]:E1204 13:35:03.448000 428941 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2659564Z [rank1]:E1204 13:35:03.448000 428941 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2659745Z [rank1]:E1204 13:35:03.448000 428941 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2660057Z [rank1]:E1204 13:35:03.448000 428941 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2660229Z [rank1]:E1204 13:35:03.448000 428941 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2660524Z [rank1]:E1204 13:35:03.448000 428941 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2660683Z [rank1]:E1204 13:35:03.448000 428941 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2660978Z [rank1]:E1204 13:35:03.448000 428941 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2661124Z [rank1]:E1204 13:35:03.448000 428941 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2661422Z [rank1]:E1204 13:35:03.448000 428941 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2661582Z [rank1]:E1204 13:35:03.448000 428941 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2662121Z [rank1]:E1204 13:35:03.448000 428941 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 25088 on device 1. CUDA driver allocated memory was 2317352960 and is now 3827302400.
2025-12-04T13:38:32.2662243Z [rank1]:E1204 13:35:03.448000 428941 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2662452Z [rank1]:E1204 13:35:03.448000 428941 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2662845Z [rank1]:E1204 13:35:03.448000 428941 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2662983Z [rank1]:E1204 13:35:03.448000 428941 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2663210Z [rank1]:E1204 13:35:03.448000 428941 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2663389Z [rank1]:E1204 13:35:03.448000 428941 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.2663430Z dist init r=1, world=4
2025-12-04T13:38:32.2663794Z [rank0]:[W1204 13:35:03.621597869 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.2663837Z FAILED [9.6206s] [100%]
2025-12-04T13:38:32.2663839Z 
2025-12-04T13:38:32.2663899Z =================================== FAILURES ===================================
2025-12-04T13:38:32.2664017Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda _
2025-12-04T13:38:32.2664066Z Traceback (most recent call last):
2025-12-04T13:38:32.2664243Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.2664301Z     self._join_processes(fn)
2025-12-04T13:38:32.2664487Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.2664558Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.2664750Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.2664797Z     raise RuntimeError(error)
2025-12-04T13:38:32.2664883Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.2664932Z Traceback (most recent call last):
2025-12-04T13:38:32.2665107Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2665152Z     getattr(self, test_name)()
2025-12-04T13:38:32.2665324Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2665360Z     fn()
2025-12-04T13:38:32.2665524Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2665567Z     method(*args, **kwargs)
2025-12-04T13:38:32.2665731Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2665774Z     method(*args, **kwargs)
2025-12-04T13:38:32.2665935Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2665989Z     with policy():
2025-12-04T13:38:32.2666154Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2666198Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2666591Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 25088 on device 3. CUDA driver allocated memory was 2250244096 and is now 3760193536.
2025-12-04T13:38:32.2666594Z 
2025-12-04T13:38:32.2666675Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2666933Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2666935Z 
2025-12-04T13:38:32.2667031Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2667034Z 
2025-12-04T13:38:32.2667049Z 
2025-12-04T13:38:32.2667132Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.2667225Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.2667475Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-60946723aa831dfe.xml -
2025-12-04T13:38:32.2667539Z =========================== short test summary info ============================
2025-12-04T13:38:32.2667815Z FAILED [9.6206s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_shard_grad_op_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.2667863Z Traceback (most recent call last):
2025-12-04T13:38:32.2668040Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2668085Z     getattr(self, test_name)()
2025-12-04T13:38:32.2668259Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2668296Z     fn()
2025-12-04T13:38:32.2668474Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2668516Z     method(*args, **kwargs)
2025-12-04T13:38:32.2668680Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2668735Z     method(*args, **kwargs)
2025-12-04T13:38:32.2668900Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2668938Z     with policy():
2025-12-04T13:38:32.2669104Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2669149Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2669537Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 25088 on device 3. CUDA driver allocated memory was 2250244096 and is now 3760193536.
2025-12-04T13:38:32.2669539Z 
2025-12-04T13:38:32.2669667Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2669926Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2669929Z 
2025-12-04T13:38:32.2670021Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2670088Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.2670173Z ======================= 1 failed, 32 deselected in 9.78s =======================
2025-12-04T13:38:32.2670212Z Got exit code 1
2025-12-04T13:38:32.2670413Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2670551Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T13:38:32.2670752Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6033c1bcb9217540.xml
2025-12-04T13:38:32.2670815Z ============================= test session starts ==============================
2025-12-04T13:38:32.2670938Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.2670982Z cachedir: .pytest_cache
2025-12-04T13:38:32.2671153Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.2671218Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.2671261Z configfile: pytest.ini
2025-12-04T13:38:32.2671441Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.2671521Z collecting ... collected 60 items / 27 deselected / 33 selected
2025-12-04T13:38:32.2671578Z stepcurrent: skipping 27 already run items.
2025-12-04T13:38:32.2671622Z Running 6 items in this shard
2025-12-04T13:38:32.2671626Z 
2025-12-04T13:38:32.2672018Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda I1204 13:35:07.833000 429273 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 429342
2025-12-04T13:38:32.2672183Z I1204 13:35:07.833000 429273 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 429343
2025-12-04T13:38:32.2672348Z I1204 13:35:07.834000 429273 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 429344
2025-12-04T13:38:32.2672510Z I1204 13:35:07.834000 429273 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 429345
2025-12-04T13:38:32.2673159Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2673221Z   _warn_cpu_init()
2025-12-04T13:38:32.2673837Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2673880Z   _warn_cpu_init()
2025-12-04T13:38:32.2674498Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2674559Z   _warn_cpu_init()
2025-12-04T13:38:32.2675174Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2675212Z   _warn_cpu_init()
2025-12-04T13:38:32.2675528Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.2675573Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2675726Z [rank0]:E1204 13:35:13.588000 429342 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2675913Z [rank0]:E1204 13:35:13.588000 429342 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2676227Z [rank0]:E1204 13:35:13.588000 429342 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2676395Z [rank0]:E1204 13:35:13.588000 429342 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2676707Z [rank0]:E1204 13:35:13.588000 429342 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2676842Z [rank0]:E1204 13:35:13.588000 429342 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2677141Z [rank0]:E1204 13:35:13.588000 429342 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2677300Z [rank0]:E1204 13:35:13.588000 429342 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2677608Z [rank0]:E1204 13:35:13.588000 429342 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2677780Z [rank0]:E1204 13:35:13.588000 429342 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2678077Z [rank0]:E1204 13:35:13.588000 429342 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2678227Z [rank0]:E1204 13:35:13.588000 429342 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2678528Z [rank0]:E1204 13:35:13.588000 429342 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2678688Z [rank0]:E1204 13:35:13.588000 429342 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2679270Z [rank0]:E1204 13:35:13.588000 429342 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 57856 on device 0. CUDA driver allocated memory was 2453667840 and is now 3594518528.
2025-12-04T13:38:32.2679407Z [rank0]:E1204 13:35:13.588000 429342 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2679696Z [rank0]:E1204 13:35:13.588000 429342 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2680147Z [rank0]:E1204 13:35:13.588000 429342 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2680270Z [rank0]:E1204 13:35:13.588000 429342 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2680498Z [rank0]:E1204 13:35:13.588000 429342 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2680692Z [rank0]:E1204 13:35:13.588000 429342 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.2680734Z dist init r=0, world=4
2025-12-04T13:38:32.2680880Z [rank1]:E1204 13:35:13.589000 429343 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2681053Z [rank1]:E1204 13:35:13.589000 429343 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2681366Z [rank1]:E1204 13:35:13.589000 429343 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2681535Z [rank1]:E1204 13:35:13.589000 429343 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2681847Z [rank1]:E1204 13:35:13.589000 429343 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2681980Z [rank1]:E1204 13:35:13.589000 429343 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2682301Z [rank1]:E1204 13:35:13.589000 429343 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2682474Z [rank1]:E1204 13:35:13.589000 429343 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2682773Z [rank1]:E1204 13:35:13.589000 429343 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2682931Z [rank1]:E1204 13:35:13.589000 429343 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2683229Z [rank1]:E1204 13:35:13.589000 429343 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2683377Z [rank1]:E1204 13:35:13.589000 429343 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2683677Z [rank1]:E1204 13:35:13.589000 429343 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2683836Z [rank1]:E1204 13:35:13.589000 429343 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2684432Z [rank1]:E1204 13:35:13.589000 429343 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 57856 on device 1. CUDA driver allocated memory was 2317352960 and is now 3458203648.
2025-12-04T13:38:32.2684557Z [rank1]:E1204 13:35:13.589000 429343 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2684766Z [rank1]:E1204 13:35:13.589000 429343 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2685215Z [rank1]:E1204 13:35:13.589000 429343 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2685350Z [rank1]:E1204 13:35:13.589000 429343 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2685576Z [rank1]:E1204 13:35:13.589000 429343 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2685753Z [rank1]:E1204 13:35:13.589000 429343 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.2685793Z dist init r=1, world=4
2025-12-04T13:38:32.2685941Z [rank3]:E1204 13:35:13.644000 429345 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2686111Z [rank3]:E1204 13:35:13.644000 429345 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2686420Z [rank3]:E1204 13:35:13.644000 429345 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2686584Z [rank3]:E1204 13:35:13.644000 429345 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2686905Z [rank3]:E1204 13:35:13.644000 429345 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2687051Z [rank3]:E1204 13:35:13.644000 429345 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2687348Z [rank3]:E1204 13:35:13.644000 429345 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2687508Z [rank3]:E1204 13:35:13.644000 429345 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2687805Z [rank3]:E1204 13:35:13.644000 429345 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2687964Z [rank3]:E1204 13:35:13.644000 429345 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2688258Z [rank3]:E1204 13:35:13.644000 429345 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2688404Z [rank3]:E1204 13:35:13.644000 429345 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2688704Z [rank3]:E1204 13:35:13.644000 429345 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2688878Z [rank3]:E1204 13:35:13.644000 429345 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2689455Z [rank3]:E1204 13:35:13.644000 429345 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 57856 on device 3. CUDA driver allocated memory was 2250244096 and is now 3391094784.
2025-12-04T13:38:32.2689626Z [rank3]:E1204 13:35:13.644000 429345 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2689857Z [rank3]:E1204 13:35:13.644000 429345 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2690302Z [rank3]:E1204 13:35:13.644000 429345 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2690424Z [rank3]:E1204 13:35:13.644000 429345 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2690653Z [rank3]:E1204 13:35:13.644000 429345 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2690828Z [rank3]:E1204 13:35:13.644000 429345 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.2690871Z dist init r=3, world=4
2025-12-04T13:38:32.2691018Z [rank2]:E1204 13:35:13.645000 429344 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2691189Z [rank2]:E1204 13:35:13.645000 429344 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2691522Z [rank2]:E1204 13:35:13.645000 429344 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2691706Z [rank2]:E1204 13:35:13.645000 429344 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2692013Z [rank2]:E1204 13:35:13.645000 429344 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2692147Z [rank2]:E1204 13:35:13.645000 429344 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2692446Z [rank2]:E1204 13:35:13.645000 429344 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2692604Z [rank2]:E1204 13:35:13.645000 429344 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2692902Z [rank2]:E1204 13:35:13.645000 429344 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2693059Z [rank2]:E1204 13:35:13.645000 429344 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2693357Z [rank2]:E1204 13:35:13.645000 429344 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2693522Z [rank2]:E1204 13:35:13.645000 429344 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2693821Z [rank2]:E1204 13:35:13.645000 429344 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2693981Z [rank2]:E1204 13:35:13.645000 429344 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2694575Z [rank2]:E1204 13:35:13.645000 429344 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 2. CUDA driver allocated memory was 2300575744 and is now 3441426432.
2025-12-04T13:38:32.2694699Z [rank2]:E1204 13:35:13.645000 429344 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2694909Z [rank2]:E1204 13:35:13.645000 429344 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2695352Z [rank2]:E1204 13:35:13.645000 429344 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2695474Z [rank2]:E1204 13:35:13.645000 429344 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2695702Z [rank2]:E1204 13:35:13.645000 429344 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2695878Z [rank2]:E1204 13:35:13.645000 429344 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.2695917Z dist init r=2, world=4
2025-12-04T13:38:32.2696293Z [rank0]:[W1204 13:35:13.768870480 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.2696346Z FAILED [7.4165s] [ 16%]
2025-12-04T13:38:32.2696349Z 
2025-12-04T13:38:32.2696409Z =================================== FAILURES ===================================
2025-12-04T13:38:32.2696570Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda _
2025-12-04T13:38:32.2696621Z Traceback (most recent call last):
2025-12-04T13:38:32.2696802Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.2696850Z     self._join_processes(fn)
2025-12-04T13:38:32.2697039Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.2697096Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.2697286Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.2697333Z     raise RuntimeError(error)
2025-12-04T13:38:32.2697419Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.2697466Z Traceback (most recent call last):
2025-12-04T13:38:32.2697638Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2697698Z     getattr(self, test_name)()
2025-12-04T13:38:32.2697869Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2697907Z     fn()
2025-12-04T13:38:32.2698071Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2698118Z     method(*args, **kwargs)
2025-12-04T13:38:32.2698280Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2698327Z     method(*args, **kwargs)
2025-12-04T13:38:32.2698491Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2698533Z     with policy():
2025-12-04T13:38:32.2698697Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2698756Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2699191Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 57856 on device 0. CUDA driver allocated memory was 2453667840 and is now 3594518528.
2025-12-04T13:38:32.2699195Z 
2025-12-04T13:38:32.2699276Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2699640Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2699642Z 
2025-12-04T13:38:32.2699736Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2699740Z 
2025-12-04T13:38:32.2699742Z 
2025-12-04T13:38:32.2699825Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.2699920Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.2700187Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6033c1bcb9217540.xml -
2025-12-04T13:38:32.2700252Z =========================== short test summary info ============================
2025-12-04T13:38:32.2700589Z FAILED [7.4165s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.2700640Z Traceback (most recent call last):
2025-12-04T13:38:32.2700815Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2700862Z     getattr(self, test_name)()
2025-12-04T13:38:32.2701036Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2701074Z     fn()
2025-12-04T13:38:32.2701237Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2701280Z     method(*args, **kwargs)
2025-12-04T13:38:32.2701442Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2701487Z     method(*args, **kwargs)
2025-12-04T13:38:32.2701648Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2701689Z     with policy():
2025-12-04T13:38:32.2701854Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2701914Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2702348Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 57856 on device 0. CUDA driver allocated memory was 2453667840 and is now 3594518528.
2025-12-04T13:38:32.2702351Z 
2025-12-04T13:38:32.2702432Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2702743Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2702746Z 
2025-12-04T13:38:32.2702838Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2702908Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.2702991Z ======================= 1 failed, 27 deselected in 7.56s =======================
2025-12-04T13:38:32.2703032Z Got exit code 1
2025-12-04T13:38:32.2703075Z Retrying single test...
2025-12-04T13:38:32.2703279Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-4586b11fe6650479.xml
2025-12-04T13:38:32.2703341Z ============================= test session starts ==============================
2025-12-04T13:38:32.2703464Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.2703508Z cachedir: .pytest_cache
2025-12-04T13:38:32.2703680Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.2703728Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.2703772Z configfile: pytest.ini
2025-12-04T13:38:32.2703949Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.2704029Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.2704344Z stepcurrent: skipping 27 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2704393Z Running 1 items in this shard
2025-12-04T13:38:32.2704395Z 
2025-12-04T13:38:32.2704795Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda I1204 13:35:17.956000 429675 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 429744
2025-12-04T13:38:32.2704962Z I1204 13:35:17.957000 429675 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 429745
2025-12-04T13:38:32.2705129Z I1204 13:35:17.957000 429675 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 429746
2025-12-04T13:38:32.2705291Z I1204 13:35:17.958000 429675 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 429747
2025-12-04T13:38:32.2705928Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2705968Z   _warn_cpu_init()
2025-12-04T13:38:32.2706587Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2706642Z   _warn_cpu_init()
2025-12-04T13:38:32.2707255Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2707297Z   _warn_cpu_init()
2025-12-04T13:38:32.2707931Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2707973Z   _warn_cpu_init()
2025-12-04T13:38:32.2708289Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.2708335Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2708487Z [rank1]:E1204 13:35:23.684000 429745 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2708661Z [rank1]:E1204 13:35:23.684000 429745 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2708975Z [rank1]:E1204 13:35:23.684000 429745 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2709152Z [rank1]:E1204 13:35:23.684000 429745 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2709463Z [rank1]:E1204 13:35:23.684000 429745 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2709663Z [rank1]:E1204 13:35:23.684000 429745 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2709969Z [rank1]:E1204 13:35:23.684000 429745 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2710130Z [rank1]:E1204 13:35:23.684000 429745 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2710433Z [rank1]:E1204 13:35:23.684000 429745 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2710593Z [rank1]:E1204 13:35:23.684000 429745 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2710889Z [rank1]:E1204 13:35:23.684000 429745 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2711055Z [rank1]:E1204 13:35:23.684000 429745 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2711353Z [rank1]:E1204 13:35:23.684000 429745 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2711515Z [rank1]:E1204 13:35:23.684000 429745 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2712094Z [rank1]:E1204 13:35:23.684000 429745 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 1. CUDA driver allocated memory was 2317352960 and is now 3458203648.
2025-12-04T13:38:32.2712220Z [rank1]:E1204 13:35:23.684000 429745 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2712447Z [rank1]:E1204 13:35:23.684000 429745 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2712893Z [rank1]:E1204 13:35:23.684000 429745 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2713017Z [rank1]:E1204 13:35:23.684000 429745 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2713246Z [rank1]:E1204 13:35:23.684000 429745 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2713424Z [rank1]:E1204 13:35:23.684000 429745 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.2713466Z dist init r=1, world=4
2025-12-04T13:38:32.2713613Z [rank3]:E1204 13:35:23.687000 429747 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2713802Z [rank3]:E1204 13:35:23.687000 429747 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2714111Z [rank3]:E1204 13:35:23.687000 429747 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2714294Z [rank3]:E1204 13:35:23.687000 429747 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2714602Z [rank3]:E1204 13:35:23.687000 429747 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2714738Z [rank3]:E1204 13:35:23.687000 429747 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2715038Z [rank3]:E1204 13:35:23.687000 429747 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2715195Z [rank3]:E1204 13:35:23.687000 429747 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2715494Z [rank3]:E1204 13:35:23.687000 429747 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2715664Z [rank3]:E1204 13:35:23.687000 429747 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2715962Z [rank3]:E1204 13:35:23.687000 429747 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2716107Z [rank3]:E1204 13:35:23.687000 429747 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2716408Z [rank3]:E1204 13:35:23.687000 429747 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2716569Z [rank3]:E1204 13:35:23.687000 429747 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2717156Z [rank3]:E1204 13:35:23.687000 429747 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 53760 on device 3. CUDA driver allocated memory was 2250244096 and is now 3391094784.
2025-12-04T13:38:32.2717282Z [rank3]:E1204 13:35:23.687000 429747 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2717491Z [rank3]:E1204 13:35:23.687000 429747 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2717936Z [rank3]:E1204 13:35:23.687000 429747 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2718059Z [rank3]:E1204 13:35:23.687000 429747 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2718287Z [rank3]:E1204 13:35:23.687000 429747 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2718475Z [rank3]:E1204 13:35:23.687000 429747 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.2718516Z dist init r=3, world=4
2025-12-04T13:38:32.2718676Z [rank0]:E1204 13:35:23.703000 429744 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2718847Z [rank0]:E1204 13:35:23.703000 429744 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2719159Z [rank0]:E1204 13:35:23.703000 429744 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2719324Z [rank0]:E1204 13:35:23.703000 429744 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2719658Z [rank0]:E1204 13:35:23.703000 429744 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2719790Z [rank0]:E1204 13:35:23.703000 429744 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2720091Z [rank0]:E1204 13:35:23.703000 429744 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2720250Z [rank0]:E1204 13:35:23.703000 429744 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2720564Z [rank0]:E1204 13:35:23.703000 429744 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2720724Z [rank0]:E1204 13:35:23.703000 429744 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2721020Z [rank0]:E1204 13:35:23.703000 429744 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2721168Z [rank0]:E1204 13:35:23.703000 429744 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2721483Z [rank0]:E1204 13:35:23.703000 429744 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2721643Z [rank0]:E1204 13:35:23.703000 429744 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2722216Z [rank0]:E1204 13:35:23.703000 429744 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 53760 on device 0. CUDA driver allocated memory was 2453667840 and is now 3594518528.
2025-12-04T13:38:32.2722340Z [rank0]:E1204 13:35:23.703000 429744 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2722550Z [rank0]:E1204 13:35:23.703000 429744 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2722992Z [rank0]:E1204 13:35:23.703000 429744 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2723128Z [rank0]:E1204 13:35:23.703000 429744 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2723356Z [rank0]:E1204 13:35:23.703000 429744 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2723546Z [rank0]:E1204 13:35:23.703000 429744 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.2723589Z dist init r=0, world=4
2025-12-04T13:38:32.2723736Z [rank2]:E1204 13:35:23.734000 429746 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2723908Z [rank2]:E1204 13:35:23.734000 429746 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2724216Z [rank2]:E1204 13:35:23.734000 429746 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2724382Z [rank2]:E1204 13:35:23.734000 429746 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2724690Z [rank2]:E1204 13:35:23.734000 429746 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2724824Z [rank2]:E1204 13:35:23.734000 429746 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2725140Z [rank2]:E1204 13:35:23.734000 429746 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2725299Z [rank2]:E1204 13:35:23.734000 429746 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2725597Z [rank2]:E1204 13:35:23.734000 429746 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2725755Z [rank2]:E1204 13:35:23.734000 429746 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2726066Z [rank2]:E1204 13:35:23.734000 429746 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2726213Z [rank2]:E1204 13:35:23.734000 429746 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2726512Z [rank2]:E1204 13:35:23.734000 429746 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2726672Z [rank2]:E1204 13:35:23.734000 429746 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2727247Z [rank2]:E1204 13:35:23.734000 429746 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 51712 on device 2. CUDA driver allocated memory was 2300575744 and is now 3441426432.
2025-12-04T13:38:32.2727371Z [rank2]:E1204 13:35:23.734000 429746 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2727593Z [rank2]:E1204 13:35:23.734000 429746 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2728035Z [rank2]:E1204 13:35:23.734000 429746 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2728166Z [rank2]:E1204 13:35:23.734000 429746 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2728396Z [rank2]:E1204 13:35:23.734000 429746 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2728574Z [rank2]:E1204 13:35:23.734000 429746 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.2728615Z dist init r=2, world=4
2025-12-04T13:38:32.2728977Z [rank0]:[W1204 13:35:23.909509773 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.2729020Z FAILED [7.4160s] [100%]
2025-12-04T13:38:32.2729022Z 
2025-12-04T13:38:32.2729082Z =================================== FAILURES ===================================
2025-12-04T13:38:32.2729244Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda _
2025-12-04T13:38:32.2729308Z Traceback (most recent call last):
2025-12-04T13:38:32.2729484Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.2729532Z     self._join_processes(fn)
2025-12-04T13:38:32.2729771Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.2729831Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.2730024Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.2730072Z     raise RuntimeError(error)
2025-12-04T13:38:32.2730157Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.2730207Z Traceback (most recent call last):
2025-12-04T13:38:32.2730380Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2730428Z     getattr(self, test_name)()
2025-12-04T13:38:32.2730616Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2730652Z     fn()
2025-12-04T13:38:32.2730818Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2730861Z     method(*args, **kwargs)
2025-12-04T13:38:32.2731027Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2731070Z     method(*args, **kwargs)
2025-12-04T13:38:32.2731234Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2731273Z     with policy():
2025-12-04T13:38:32.2731439Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2731483Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2731937Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 53760 on device 0. CUDA driver allocated memory was 2453667840 and is now 3594518528.
2025-12-04T13:38:32.2731939Z 
2025-12-04T13:38:32.2732020Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2732347Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2732349Z 
2025-12-04T13:38:32.2732444Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2732446Z 
2025-12-04T13:38:32.2732510Z Process 1 exited with error code 10 and exception:
2025-12-04T13:38:32.2732560Z Traceback (most recent call last):
2025-12-04T13:38:32.2732733Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2732779Z     getattr(self, test_name)()
2025-12-04T13:38:32.2732951Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2732989Z     fn()
2025-12-04T13:38:32.2733152Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2733196Z     method(*args, **kwargs)
2025-12-04T13:38:32.2733358Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2733401Z     method(*args, **kwargs)
2025-12-04T13:38:32.2733561Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2733619Z     with policy():
2025-12-04T13:38:32.2733781Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2733826Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2734257Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 1. CUDA driver allocated memory was 2317352960 and is now 3458203648.
2025-12-04T13:38:32.2734262Z 
2025-12-04T13:38:32.2734342Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2734654Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2734657Z 
2025-12-04T13:38:32.2734760Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2734762Z 
2025-12-04T13:38:32.2734826Z Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.2734873Z Traceback (most recent call last):
2025-12-04T13:38:32.2735051Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2735095Z     getattr(self, test_name)()
2025-12-04T13:38:32.2735268Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2735304Z     fn()
2025-12-04T13:38:32.2735467Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2735507Z     method(*args, **kwargs)
2025-12-04T13:38:32.2735672Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2735714Z     method(*args, **kwargs)
2025-12-04T13:38:32.2735875Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2735914Z     with policy():
2025-12-04T13:38:32.2736091Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2736135Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2736578Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 53760 on device 3. CUDA driver allocated memory was 2250244096 and is now 3391094784.
2025-12-04T13:38:32.2736580Z 
2025-12-04T13:38:32.2736662Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2736972Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2736974Z 
2025-12-04T13:38:32.2737069Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2737072Z 
2025-12-04T13:38:32.2737074Z 
2025-12-04T13:38:32.2737153Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.2737251Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.2737500Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-4586b11fe6650479.xml -
2025-12-04T13:38:32.2737567Z =========================== short test summary info ============================
2025-12-04T13:38:32.2738011Z FAILED [7.4160s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.2738059Z Traceback (most recent call last):
2025-12-04T13:38:32.2738239Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2738283Z     getattr(self, test_name)()
2025-12-04T13:38:32.2738458Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2738495Z     fn()
2025-12-04T13:38:32.2738659Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2738701Z     method(*args, **kwargs)
2025-12-04T13:38:32.2738867Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2741747Z     method(*args, **kwargs)
2025-12-04T13:38:32.2741920Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2741959Z     with policy():
2025-12-04T13:38:32.2742127Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2742169Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2742602Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 53760 on device 0. CUDA driver allocated memory was 2453667840 and is now 3594518528.
2025-12-04T13:38:32.2742606Z 
2025-12-04T13:38:32.2742686Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2742993Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2742995Z 
2025-12-04T13:38:32.2743088Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2743090Z 
2025-12-04T13:38:32.2743174Z Process 1 exited with error code 10 and exception:
2025-12-04T13:38:32.2743225Z Traceback (most recent call last):
2025-12-04T13:38:32.2743400Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2743470Z     getattr(self, test_name)()
2025-12-04T13:38:32.2743642Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2743680Z     fn()
2025-12-04T13:38:32.2743842Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2743886Z     method(*args, **kwargs)
2025-12-04T13:38:32.2744048Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2744090Z     method(*args, **kwargs)
2025-12-04T13:38:32.2744253Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2744294Z     with policy():
2025-12-04T13:38:32.2744459Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2744504Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2744939Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 1. CUDA driver allocated memory was 2317352960 and is now 3458203648.
2025-12-04T13:38:32.2744957Z 
2025-12-04T13:38:32.2745035Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2745344Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2745346Z 
2025-12-04T13:38:32.2745438Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2745441Z 
2025-12-04T13:38:32.2745504Z Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.2745551Z Traceback (most recent call last):
2025-12-04T13:38:32.2745727Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2745771Z     getattr(self, test_name)()
2025-12-04T13:38:32.2745965Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2746001Z     fn()
2025-12-04T13:38:32.2746165Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2746207Z     method(*args, **kwargs)
2025-12-04T13:38:32.2746371Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2746414Z     method(*args, **kwargs)
2025-12-04T13:38:32.2746577Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2746617Z     with policy():
2025-12-04T13:38:32.2746779Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2746824Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2747256Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 53760 on device 3. CUDA driver allocated memory was 2250244096 and is now 3391094784.
2025-12-04T13:38:32.2747259Z 
2025-12-04T13:38:32.2747348Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2747656Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2747669Z 
2025-12-04T13:38:32.2747762Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2747830Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.2747899Z ======================= 1 failed, 32 deselected in 7.56s =======================
2025-12-04T13:38:32.2747939Z Got exit code 1
2025-12-04T13:38:32.2747983Z Retrying single test...
2025-12-04T13:38:32.2748189Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-b6b47062a3b6547f.xml
2025-12-04T13:38:32.2748255Z ============================= test session starts ==============================
2025-12-04T13:38:32.2748378Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.2748423Z cachedir: .pytest_cache
2025-12-04T13:38:32.2748595Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.2748644Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.2748688Z configfile: pytest.ini
2025-12-04T13:38:32.2748866Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.2748959Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.2749262Z stepcurrent: skipping 27 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2749311Z Running 1 items in this shard
2025-12-04T13:38:32.2749313Z 
2025-12-04T13:38:32.2749726Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda I1204 13:35:28.011000 430077 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 430146
2025-12-04T13:38:32.2749896Z I1204 13:35:28.011000 430077 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 430147
2025-12-04T13:38:32.2750059Z I1204 13:35:28.012000 430077 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 430148
2025-12-04T13:38:32.2750240Z I1204 13:35:28.012000 430077 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 430149
2025-12-04T13:38:32.2750866Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2750907Z   _warn_cpu_init()
2025-12-04T13:38:32.2751526Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2751566Z   _warn_cpu_init()
2025-12-04T13:38:32.2752200Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2752255Z   _warn_cpu_init()
2025-12-04T13:38:32.2752867Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2752909Z   _warn_cpu_init()
2025-12-04T13:38:32.2753224Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.2753272Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2753424Z [rank3]:E1204 13:35:33.758000 430149 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2753600Z [rank3]:E1204 13:35:33.758000 430149 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2753926Z [rank3]:E1204 13:35:33.758000 430149 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2754092Z [rank3]:E1204 13:35:33.758000 430149 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2754399Z [rank3]:E1204 13:35:33.758000 430149 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2754614Z [rank3]:E1204 13:35:33.758000 430149 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2754916Z [rank3]:E1204 13:35:33.758000 430149 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2755096Z [rank3]:E1204 13:35:33.758000 430149 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2755405Z [rank3]:E1204 13:35:33.758000 430149 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2755565Z [rank3]:E1204 13:35:33.758000 430149 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2755863Z [rank3]:E1204 13:35:33.758000 430149 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2756012Z [rank3]:E1204 13:35:33.758000 430149 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2756312Z [rank3]:E1204 13:35:33.758000 430149 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2756483Z [rank3]:E1204 13:35:33.758000 430149 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2757055Z [rank3]:E1204 13:35:33.758000 430149 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 2250244096 and is now 3391094784.
2025-12-04T13:38:32.2757192Z [rank3]:E1204 13:35:33.758000 430149 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2757405Z [rank3]:E1204 13:35:33.758000 430149 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2757852Z [rank3]:E1204 13:35:33.758000 430149 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2757975Z [rank3]:E1204 13:35:33.758000 430149 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2758203Z [rank3]:E1204 13:35:33.758000 430149 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2758381Z [rank3]:E1204 13:35:33.758000 430149 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.2758433Z dist init r=3, world=4
2025-12-04T13:38:32.2758583Z [rank2]:E1204 13:35:33.762000 430148 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2758754Z [rank2]:E1204 13:35:33.762000 430148 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2759065Z [rank2]:E1204 13:35:33.762000 430148 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2759232Z [rank2]:E1204 13:35:33.762000 430148 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2759539Z [rank2]:E1204 13:35:33.762000 430148 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2759727Z [rank2]:E1204 13:35:33.762000 430148 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2760027Z [rank2]:E1204 13:35:33.762000 430148 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2760186Z [rank2]:E1204 13:35:33.762000 430148 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2760485Z [rank2]:E1204 13:35:33.762000 430148 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2760643Z [rank2]:E1204 13:35:33.762000 430148 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2760943Z [rank2]:E1204 13:35:33.762000 430148 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2761087Z [rank2]:E1204 13:35:33.762000 430148 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2761405Z [rank2]:E1204 13:35:33.762000 430148 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2761578Z [rank2]:E1204 13:35:33.762000 430148 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2762147Z [rank2]:E1204 13:35:33.762000 430148 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 2. CUDA driver allocated memory was 2300575744 and is now 3441426432.
2025-12-04T13:38:32.2762272Z [rank2]:E1204 13:35:33.762000 430148 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2762483Z [rank2]:E1204 13:35:33.762000 430148 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2762927Z [rank2]:E1204 13:35:33.762000 430148 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2763048Z [rank2]:E1204 13:35:33.762000 430148 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2763291Z [rank2]:E1204 13:35:33.762000 430148 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2763466Z [rank2]:E1204 13:35:33.762000 430148 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.2763508Z dist init r=2, world=4
2025-12-04T13:38:32.2763655Z [rank1]:E1204 13:35:33.793000 430147 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2763828Z [rank1]:E1204 13:35:33.793000 430147 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2764138Z [rank1]:E1204 13:35:33.793000 430147 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2764314Z [rank1]:E1204 13:35:33.793000 430147 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2764624Z [rank1]:E1204 13:35:33.793000 430147 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2764755Z [rank1]:E1204 13:35:33.793000 430147 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2765056Z [rank1]:E1204 13:35:33.793000 430147 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2765214Z [rank1]:E1204 13:35:33.793000 430147 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2765517Z [rank1]:E1204 13:35:33.793000 430147 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2765674Z [rank1]:E1204 13:35:33.793000 430147 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2765982Z [rank1]:E1204 13:35:33.793000 430147 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2766137Z [rank1]:E1204 13:35:33.793000 430147 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2766435Z [rank1]:E1204 13:35:33.793000 430147 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2766596Z [rank1]:E1204 13:35:33.793000 430147 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2767168Z [rank1]:E1204 13:35:33.793000 430147 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 55808 on device 1. CUDA driver allocated memory was 2317352960 and is now 3458203648.
2025-12-04T13:38:32.2767291Z [rank1]:E1204 13:35:33.793000 430147 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2767502Z [rank1]:E1204 13:35:33.793000 430147 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2767959Z [rank1]:E1204 13:35:33.793000 430147 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2768082Z [rank1]:E1204 13:35:33.793000 430147 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2768307Z [rank1]:E1204 13:35:33.793000 430147 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2768484Z [rank1]:E1204 13:35:33.793000 430147 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.2768524Z dist init r=1, world=4
2025-12-04T13:38:32.2768672Z [rank0]:E1204 13:35:33.809000 430146 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2768853Z [rank0]:E1204 13:35:33.809000 430146 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2769162Z [rank0]:E1204 13:35:33.809000 430146 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2769326Z [rank0]:E1204 13:35:33.809000 430146 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2769672Z [rank0]:E1204 13:35:33.809000 430146 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2769804Z [rank0]:E1204 13:35:33.809000 430146 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2770104Z [rank0]:E1204 13:35:33.809000 430146 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2770262Z [rank0]:E1204 13:35:33.809000 430146 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2770575Z [rank0]:E1204 13:35:33.809000 430146 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2770745Z [rank0]:E1204 13:35:33.809000 430146 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2771042Z [rank0]:E1204 13:35:33.809000 430146 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2771189Z [rank0]:E1204 13:35:33.809000 430146 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2771491Z [rank0]:E1204 13:35:33.809000 430146 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2771649Z [rank0]:E1204 13:35:33.809000 430146 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2772218Z [rank0]:E1204 13:35:33.809000 430146 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 51712 on device 0. CUDA driver allocated memory was 2453667840 and is now 3594518528.
2025-12-04T13:38:32.2772355Z [rank0]:E1204 13:35:33.809000 430146 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2772564Z [rank0]:E1204 13:35:33.809000 430146 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2773009Z [rank0]:E1204 13:35:33.809000 430146 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2773131Z [rank0]:E1204 13:35:33.809000 430146 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2773356Z [rank0]:E1204 13:35:33.809000 430146 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2773544Z [rank0]:E1204 13:35:33.809000 430146 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.2773586Z dist init r=0, world=4
2025-12-04T13:38:32.2773947Z [rank0]:[W1204 13:35:34.085905987 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.2773988Z FAILED [7.3165s] [100%]
2025-12-04T13:38:32.2773991Z 
2025-12-04T13:38:32.2774051Z =================================== FAILURES ===================================
2025-12-04T13:38:32.2774213Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda _
2025-12-04T13:38:32.2774262Z Traceback (most recent call last):
2025-12-04T13:38:32.2774439Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.2774485Z     self._join_processes(fn)
2025-12-04T13:38:32.2774671Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.2774728Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.2774931Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.2774988Z     raise RuntimeError(error)
2025-12-04T13:38:32.2775072Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.2775119Z Traceback (most recent call last):
2025-12-04T13:38:32.2775292Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2775337Z     getattr(self, test_name)()
2025-12-04T13:38:32.2775509Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2775546Z     fn()
2025-12-04T13:38:32.2775707Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2775750Z     method(*args, **kwargs)
2025-12-04T13:38:32.2775910Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2775953Z     method(*args, **kwargs)
2025-12-04T13:38:32.2776115Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2776154Z     with policy():
2025-12-04T13:38:32.2776317Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2776360Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2776802Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 2250244096 and is now 3391094784.
2025-12-04T13:38:32.2776804Z 
2025-12-04T13:38:32.2776883Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2777192Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2777196Z 
2025-12-04T13:38:32.2777286Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2777289Z 
2025-12-04T13:38:32.2777291Z 
2025-12-04T13:38:32.2777371Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.2777482Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.2777735Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-b6b47062a3b6547f.xml -
2025-12-04T13:38:32.2777799Z =========================== short test summary info ============================
2025-12-04T13:38:32.2778117Z FAILED [7.3165s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.2778167Z Traceback (most recent call last):
2025-12-04T13:38:32.2778341Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2778386Z     getattr(self, test_name)()
2025-12-04T13:38:32.2778556Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2778594Z     fn()
2025-12-04T13:38:32.2778757Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2778800Z     method(*args, **kwargs)
2025-12-04T13:38:32.2778973Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2779016Z     method(*args, **kwargs)
2025-12-04T13:38:32.2779177Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2779228Z     with policy():
2025-12-04T13:38:32.2779391Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2779434Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2779920Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 2250244096 and is now 3391094784.
2025-12-04T13:38:32.2779923Z 
2025-12-04T13:38:32.2780001Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2780311Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2780314Z 
2025-12-04T13:38:32.2780406Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2780474Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.2780538Z ======================= 1 failed, 32 deselected in 7.47s =======================
2025-12-04T13:38:32.2780594Z Got exit code 1
2025-12-04T13:38:32.2780843Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda
2025-12-04T13:38:32.2780980Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T13:38:32.2781182Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-c0ec8b517b4caa4f.xml
2025-12-04T13:38:32.2781245Z ============================= test session starts ==============================
2025-12-04T13:38:32.2781364Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.2781407Z cachedir: .pytest_cache
2025-12-04T13:38:32.2781576Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.2781626Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.2781683Z configfile: pytest.ini
2025-12-04T13:38:32.2781856Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.2781934Z collecting ... collected 60 items / 28 deselected / 32 selected
2025-12-04T13:38:32.2781990Z stepcurrent: skipping 28 already run items.
2025-12-04T13:38:32.2782035Z Running 5 items in this shard
2025-12-04T13:38:32.2782038Z 
2025-12-04T13:38:32.2782411Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda I1204 13:35:37.969000 430479 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 430548
2025-12-04T13:38:32.2782578Z I1204 13:35:37.970000 430479 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 430549
2025-12-04T13:38:32.2782742Z I1204 13:35:37.970000 430479 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 430550
2025-12-04T13:38:32.2782905Z I1204 13:35:37.971000 430479 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 430551
2025-12-04T13:38:32.2783550Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2783605Z   _warn_cpu_init()
2025-12-04T13:38:32.2784217Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2784256Z   _warn_cpu_init()
2025-12-04T13:38:32.2784865Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2784904Z   _warn_cpu_init()
2025-12-04T13:38:32.2785219Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.2785278Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2785893Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2785932Z   _warn_cpu_init()
2025-12-04T13:38:32.2786084Z [rank0]:E1204 13:35:43.954000 430548 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2786256Z [rank0]:E1204 13:35:43.954000 430548 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2786583Z [rank0]:E1204 13:35:43.954000 430548 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2786749Z [rank0]:E1204 13:35:43.954000 430548 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2787053Z [rank0]:E1204 13:35:43.954000 430548 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2787186Z [rank0]:E1204 13:35:43.954000 430548 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2787483Z [rank0]:E1204 13:35:43.954000 430548 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2787641Z [rank0]:E1204 13:35:43.954000 430548 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2787952Z [rank0]:E1204 13:35:43.954000 430548 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2788107Z [rank0]:E1204 13:35:43.954000 430548 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2788415Z [rank0]:E1204 13:35:43.954000 430548 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2788560Z [rank0]:E1204 13:35:43.954000 430548 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2788856Z [rank0]:E1204 13:35:43.954000 430548 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2789014Z [rank0]:E1204 13:35:43.954000 430548 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2789567Z [rank0]:E1204 13:35:43.954000 430548 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 14848 on device 0. CUDA driver allocated memory was 2453667840 and is now 3571449856.
2025-12-04T13:38:32.2789759Z [rank0]:E1204 13:35:43.954000 430548 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2789982Z [rank0]:E1204 13:35:43.954000 430548 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2790407Z [rank0]:E1204 13:35:43.954000 430548 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda
2025-12-04T13:38:32.2790528Z [rank0]:E1204 13:35:43.954000 430548 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2790748Z [rank0]:E1204 13:35:43.954000 430548 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2790920Z [rank0]:E1204 13:35:43.954000 430548 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.2790973Z dist init r=0, world=4
2025-12-04T13:38:32.2791117Z [rank2]:E1204 13:35:43.998000 430550 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2791285Z [rank2]:E1204 13:35:43.998000 430550 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2791587Z [rank2]:E1204 13:35:43.998000 430550 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2791747Z [rank2]:E1204 13:35:43.998000 430550 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2792045Z [rank2]:E1204 13:35:43.998000 430550 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2792176Z [rank2]:E1204 13:35:43.998000 430550 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2792479Z [rank2]:E1204 13:35:43.998000 430550 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2792634Z [rank2]:E1204 13:35:43.998000 430550 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2792939Z [rank2]:E1204 13:35:43.998000 430550 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2793094Z [rank2]:E1204 13:35:43.998000 430550 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2793383Z [rank2]:E1204 13:35:43.998000 430550 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2793526Z [rank2]:E1204 13:35:43.998000 430550 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2793819Z [rank2]:E1204 13:35:43.998000 430550 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2793973Z [rank2]:E1204 13:35:43.998000 430550 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2794515Z [rank2]:E1204 13:35:43.998000 430550 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 16896 on device 2. CUDA driver allocated memory was 2300575744 and is now 3418357760.
2025-12-04T13:38:32.2794657Z [rank2]:E1204 13:35:43.998000 430550 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2794863Z [rank2]:E1204 13:35:43.998000 430550 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2795287Z [rank2]:E1204 13:35:43.998000 430550 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda
2025-12-04T13:38:32.2795406Z [rank2]:E1204 13:35:43.998000 430550 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2795641Z [rank2]:E1204 13:35:43.998000 430550 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2795814Z [rank2]:E1204 13:35:43.998000 430550 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.2795854Z dist init r=2, world=4
2025-12-04T13:38:32.2795996Z [rank1]:E1204 13:35:44.007000 430549 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2796163Z [rank1]:E1204 13:35:44.007000 430549 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2796464Z [rank1]:E1204 13:35:44.007000 430549 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2796627Z [rank1]:E1204 13:35:44.007000 430549 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2796936Z [rank1]:E1204 13:35:44.007000 430549 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2797065Z [rank1]:E1204 13:35:44.007000 430549 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2797364Z [rank1]:E1204 13:35:44.007000 430549 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2797516Z [rank1]:E1204 13:35:44.007000 430549 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2797809Z [rank1]:E1204 13:35:44.007000 430549 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2797962Z [rank1]:E1204 13:35:44.007000 430549 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2798250Z [rank1]:E1204 13:35:44.007000 430549 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2798391Z [rank1]:E1204 13:35:44.007000 430549 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2798680Z [rank1]:E1204 13:35:44.007000 430549 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2798845Z [rank1]:E1204 13:35:44.007000 430549 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2799381Z [rank1]:E1204 13:35:44.007000 430549 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 14848 on device 1. CUDA driver allocated memory was 2317352960 and is now 3435134976.
2025-12-04T13:38:32.2799500Z [rank1]:E1204 13:35:44.007000 430549 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2799744Z [rank1]:E1204 13:35:44.007000 430549 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2800176Z [rank1]:E1204 13:35:44.007000 430549 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda
2025-12-04T13:38:32.2800298Z [rank1]:E1204 13:35:44.007000 430549 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2800513Z [rank1]:E1204 13:35:44.007000 430549 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2800678Z [rank1]:E1204 13:35:44.007000 430549 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.2800715Z dist init r=1, world=4
2025-12-04T13:38:32.2800851Z [rank3]:E1204 13:35:44.011000 430551 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2801010Z [rank3]:E1204 13:35:44.011000 430551 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2801312Z [rank3]:E1204 13:35:44.011000 430551 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2801464Z [rank3]:E1204 13:35:44.011000 430551 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2801759Z [rank3]:E1204 13:35:44.011000 430551 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2801882Z [rank3]:E1204 13:35:44.011000 430551 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2802159Z [rank3]:E1204 13:35:44.011000 430551 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2802306Z [rank3]:E1204 13:35:44.011000 430551 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2802584Z [rank3]:E1204 13:35:44.011000 430551 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2802731Z [rank3]:E1204 13:35:44.011000 430551 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2803006Z [rank3]:E1204 13:35:44.011000 430551 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2803154Z [rank3]:E1204 13:35:44.011000 430551 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2803433Z [rank3]:E1204 13:35:44.011000 430551 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2803579Z [rank3]:E1204 13:35:44.011000 430551 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2804096Z [rank3]:E1204 13:35:44.011000 430551 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 23040 on device 3. CUDA driver allocated memory was 2250244096 and is now 3368026112.
2025-12-04T13:38:32.2804221Z [rank3]:E1204 13:35:44.011000 430551 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2804414Z [rank3]:E1204 13:35:44.011000 430551 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2804814Z [rank3]:E1204 13:35:44.011000 430551 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda
2025-12-04T13:38:32.2804927Z [rank3]:E1204 13:35:44.011000 430551 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2805137Z [rank3]:E1204 13:35:44.011000 430551 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2805301Z [rank3]:E1204 13:35:44.011000 430551 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.2805338Z dist init r=3, world=4
2025-12-04T13:38:32.2805680Z [rank0]:[W1204 13:35:44.128536110 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.2805719Z FAILED [7.6170s] [ 20%]
2025-12-04T13:38:32.2805730Z 
2025-12-04T13:38:32.2805784Z =================================== FAILURES ===================================
2025-12-04T13:38:32.2805921Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda _
2025-12-04T13:38:32.2805966Z Traceback (most recent call last):
2025-12-04T13:38:32.2806129Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.2806173Z     self._join_processes(fn)
2025-12-04T13:38:32.2806345Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.2806398Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.2806576Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.2806619Z     raise RuntimeError(error)
2025-12-04T13:38:32.2806697Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.2806742Z Traceback (most recent call last):
2025-12-04T13:38:32.2806902Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2806943Z     getattr(self, test_name)()
2025-12-04T13:38:32.2807118Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2807152Z     fn()
2025-12-04T13:38:32.2807303Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2807343Z     method(*args, **kwargs)
2025-12-04T13:38:32.2807494Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2807533Z     method(*args, **kwargs)
2025-12-04T13:38:32.2807683Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2807719Z     with policy():
2025-12-04T13:38:32.2807871Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2807911Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2808311Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 14848 on device 0. CUDA driver allocated memory was 2453667840 and is now 3571449856.
2025-12-04T13:38:32.2808314Z 
2025-12-04T13:38:32.2808389Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2808662Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda
2025-12-04T13:38:32.2808665Z 
2025-12-04T13:38:32.2808751Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2808753Z 
2025-12-04T13:38:32.2808754Z 
2025-12-04T13:38:32.2808829Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.2808915Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.2809152Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-c0ec8b517b4caa4f.xml -
2025-12-04T13:38:32.2809210Z =========================== short test summary info ============================
2025-12-04T13:38:32.2809510Z FAILED [7.6170s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.2809611Z Traceback (most recent call last):
2025-12-04T13:38:32.2809776Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2809817Z     getattr(self, test_name)()
2025-12-04T13:38:32.2809978Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2810012Z     fn()
2025-12-04T13:38:32.2810164Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2810203Z     method(*args, **kwargs)
2025-12-04T13:38:32.2810353Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2810392Z     method(*args, **kwargs)
2025-12-04T13:38:32.2810541Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2810578Z     with policy():
2025-12-04T13:38:32.2810730Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2810770Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2811163Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 14848 on device 0. CUDA driver allocated memory was 2453667840 and is now 3571449856.
2025-12-04T13:38:32.2811179Z 
2025-12-04T13:38:32.2811252Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2811525Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda
2025-12-04T13:38:32.2811528Z 
2025-12-04T13:38:32.2811614Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2811676Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.2811737Z ======================= 1 failed, 28 deselected in 7.76s =======================
2025-12-04T13:38:32.2811774Z Got exit code 1
2025-12-04T13:38:32.2811813Z Retrying single test...
2025-12-04T13:38:32.2812018Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6f06e3aa3004e1f2.xml
2025-12-04T13:38:32.2812074Z ============================= test session starts ==============================
2025-12-04T13:38:32.2812188Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.2812227Z cachedir: .pytest_cache
2025-12-04T13:38:32.2812385Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.2812431Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.2812471Z configfile: pytest.ini
2025-12-04T13:38:32.2812632Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.2812705Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.2812976Z stepcurrent: skipping 28 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda
2025-12-04T13:38:32.2813018Z Running 1 items in this shard
2025-12-04T13:38:32.2813020Z 
2025-12-04T13:38:32.2813378Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda I1204 13:35:48.295000 430881 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 430950
2025-12-04T13:38:32.2813545Z I1204 13:35:48.296000 430881 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 430951
2025-12-04T13:38:32.2813696Z I1204 13:35:48.296000 430881 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 430952
2025-12-04T13:38:32.2813845Z I1204 13:35:48.297000 430881 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 430953
2025-12-04T13:38:32.2814427Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2814464Z   _warn_cpu_init()
2025-12-04T13:38:32.2815028Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2815075Z   _warn_cpu_init()
2025-12-04T13:38:32.2815637Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2815674Z   _warn_cpu_init()
2025-12-04T13:38:32.2816247Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2816284Z   _warn_cpu_init()
2025-12-04T13:38:32.2816577Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.2816620Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2816761Z [rank1]:E1204 13:35:54.222000 430951 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2816922Z [rank1]:E1204 13:35:54.222000 430951 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2817210Z [rank1]:E1204 13:35:54.222000 430951 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2817365Z [rank1]:E1204 13:35:54.222000 430951 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2817660Z [rank1]:E1204 13:35:54.222000 430951 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2817784Z [rank1]:E1204 13:35:54.222000 430951 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2818072Z [rank1]:E1204 13:35:54.222000 430951 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2818219Z [rank1]:E1204 13:35:54.222000 430951 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2818501Z [rank1]:E1204 13:35:54.222000 430951 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2818648Z [rank1]:E1204 13:35:54.222000 430951 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2818928Z [rank1]:E1204 13:35:54.222000 430951 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2819064Z [rank1]:E1204 13:35:54.222000 430951 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2819344Z [rank1]:E1204 13:35:54.222000 430951 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2819500Z [rank1]:E1204 13:35:54.222000 430951 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2820053Z [rank1]:E1204 13:35:54.222000 430951 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 23040 on device 1. CUDA driver allocated memory was 2317352960 and is now 3435134976.
2025-12-04T13:38:32.2820172Z [rank1]:E1204 13:35:54.222000 430951 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2820369Z [rank1]:E1204 13:35:54.222000 430951 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2820783Z [rank1]:E1204 13:35:54.222000 430951 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda
2025-12-04T13:38:32.2820897Z [rank1]:E1204 13:35:54.222000 430951 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2821108Z [rank1]:E1204 13:35:54.222000 430951 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2821273Z [rank1]:E1204 13:35:54.222000 430951 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.2821311Z dist init r=1, world=4
2025-12-04T13:38:32.2821447Z [rank0]:E1204 13:35:54.237000 430950 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2821609Z [rank0]:E1204 13:35:54.237000 430950 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2821907Z [rank0]:E1204 13:35:54.237000 430950 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2822060Z [rank0]:E1204 13:35:54.237000 430950 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2822355Z [rank0]:E1204 13:35:54.237000 430950 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2822477Z [rank0]:E1204 13:35:54.237000 430950 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2822753Z [rank0]:E1204 13:35:54.237000 430950 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2822900Z [rank0]:E1204 13:35:54.237000 430950 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2823177Z [rank0]:E1204 13:35:54.237000 430950 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2823324Z [rank0]:E1204 13:35:54.237000 430950 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2823603Z [rank0]:E1204 13:35:54.237000 430950 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2823751Z [rank0]:E1204 13:35:54.237000 430950 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2824027Z [rank0]:E1204 13:35:54.237000 430950 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2824173Z [rank0]:E1204 13:35:54.237000 430950 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2824688Z [rank0]:E1204 13:35:54.237000 430950 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 18944 on device 0. CUDA driver allocated memory was 2453667840 and is now 3571449856.
2025-12-04T13:38:32.2824814Z [rank0]:E1204 13:35:54.237000 430950 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2825007Z [rank0]:E1204 13:35:54.237000 430950 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2825407Z [rank0]:E1204 13:35:54.237000 430950 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda
2025-12-04T13:38:32.2825519Z [rank0]:E1204 13:35:54.237000 430950 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2825732Z [rank0]:E1204 13:35:54.237000 430950 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2825897Z [rank0]:E1204 13:35:54.237000 430950 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.2825934Z dist init r=0, world=4
2025-12-04T13:38:32.2826088Z [rank2]:E1204 13:35:54.296000 430952 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2826247Z [rank2]:E1204 13:35:54.296000 430952 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2826544Z [rank2]:E1204 13:35:54.296000 430952 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2826697Z [rank2]:E1204 13:35:54.296000 430952 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2826983Z [rank2]:E1204 13:35:54.296000 430952 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2827106Z [rank2]:E1204 13:35:54.296000 430952 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2827385Z [rank2]:E1204 13:35:54.296000 430952 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2827530Z [rank2]:E1204 13:35:54.296000 430952 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2827806Z [rank2]:E1204 13:35:54.296000 430952 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2827964Z [rank2]:E1204 13:35:54.296000 430952 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2828242Z [rank2]:E1204 13:35:54.296000 430952 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2828378Z [rank2]:E1204 13:35:54.296000 430952 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2828659Z [rank2]:E1204 13:35:54.296000 430952 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2828806Z [rank2]:E1204 13:35:54.296000 430952 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2829336Z [rank2]:E1204 13:35:54.296000 430952 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 14848 on device 2. CUDA driver allocated memory was 2300575744 and is now 3418357760.
2025-12-04T13:38:32.2829451Z [rank2]:E1204 13:35:54.296000 430952 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2829686Z [rank2]:E1204 13:35:54.296000 430952 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2830081Z [rank2]:E1204 13:35:54.296000 430952 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda
2025-12-04T13:38:32.2830196Z [rank2]:E1204 13:35:54.296000 430952 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2830422Z [rank2]:E1204 13:35:54.296000 430952 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2830588Z [rank2]:E1204 13:35:54.296000 430952 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.2830637Z dist init r=2, world=4
2025-12-04T13:38:32.2830773Z [rank3]:E1204 13:35:54.300000 430953 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2830930Z [rank3]:E1204 13:35:54.300000 430953 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2831217Z [rank3]:E1204 13:35:54.300000 430953 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2831369Z [rank3]:E1204 13:35:54.300000 430953 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2831651Z [rank3]:E1204 13:35:54.300000 430953 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2831773Z [rank3]:E1204 13:35:54.300000 430953 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2832047Z [rank3]:E1204 13:35:54.300000 430953 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2832206Z [rank3]:E1204 13:35:54.300000 430953 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2832480Z [rank3]:E1204 13:35:54.300000 430953 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2832626Z [rank3]:E1204 13:35:54.300000 430953 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2832908Z [rank3]:E1204 13:35:54.300000 430953 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2833044Z [rank3]:E1204 13:35:54.300000 430953 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2833332Z [rank3]:E1204 13:35:54.300000 430953 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2833478Z [rank3]:E1204 13:35:54.300000 430953 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2833993Z [rank3]:E1204 13:35:54.300000 430953 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 23040 on device 3. CUDA driver allocated memory was 2250244096 and is now 3368026112.
2025-12-04T13:38:32.2834106Z [rank3]:E1204 13:35:54.300000 430953 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2834300Z [rank3]:E1204 13:35:54.300000 430953 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2834712Z [rank3]:E1204 13:35:54.300000 430953 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda
2025-12-04T13:38:32.2834823Z [rank3]:E1204 13:35:54.300000 430953 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2835045Z [rank3]:E1204 13:35:54.300000 430953 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2835208Z [rank3]:E1204 13:35:54.300000 430953 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.2835250Z dist init r=3, world=4
2025-12-04T13:38:32.2835583Z [rank0]:[W1204 13:35:54.429063152 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.2835621Z FAILED [7.6167s] [100%]
2025-12-04T13:38:32.2835623Z 
2025-12-04T13:38:32.2835679Z =================================== FAILURES ===================================
2025-12-04T13:38:32.2835817Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda _
2025-12-04T13:38:32.2835864Z Traceback (most recent call last):
2025-12-04T13:38:32.2836026Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.2836069Z     self._join_processes(fn)
2025-12-04T13:38:32.2836240Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.2836304Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.2836481Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.2836525Z     raise RuntimeError(error)
2025-12-04T13:38:32.2836603Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.2836647Z Traceback (most recent call last):
2025-12-04T13:38:32.2836807Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2836849Z     getattr(self, test_name)()
2025-12-04T13:38:32.2837007Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2837041Z     fn()
2025-12-04T13:38:32.2837192Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2837243Z     method(*args, **kwargs)
2025-12-04T13:38:32.2837394Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2837434Z     method(*args, **kwargs)
2025-12-04T13:38:32.2837584Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2837620Z     with policy():
2025-12-04T13:38:32.2837773Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2837814Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2838205Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 18944 on device 0. CUDA driver allocated memory was 2453667840 and is now 3571449856.
2025-12-04T13:38:32.2838209Z 
2025-12-04T13:38:32.2838282Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2838566Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda
2025-12-04T13:38:32.2838568Z 
2025-12-04T13:38:32.2838654Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2838666Z 
2025-12-04T13:38:32.2838725Z Process 1 exited with error code 10 and exception:
2025-12-04T13:38:32.2838768Z Traceback (most recent call last):
2025-12-04T13:38:32.2838932Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2838972Z     getattr(self, test_name)()
2025-12-04T13:38:32.2839133Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2839167Z     fn()
2025-12-04T13:38:32.2839317Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2839357Z     method(*args, **kwargs)
2025-12-04T13:38:32.2839506Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2839546Z     method(*args, **kwargs)
2025-12-04T13:38:32.2839727Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2839763Z     with policy():
2025-12-04T13:38:32.2839914Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2839955Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2840362Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 23040 on device 1. CUDA driver allocated memory was 2317352960 and is now 3435134976.
2025-12-04T13:38:32.2840364Z 
2025-12-04T13:38:32.2840439Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2840711Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda
2025-12-04T13:38:32.2840714Z 
2025-12-04T13:38:32.2840799Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2840801Z 
2025-12-04T13:38:32.2840803Z 
2025-12-04T13:38:32.2840877Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.2840963Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.2841222Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6f06e3aa3004e1f2.xml -
2025-12-04T13:38:32.2841284Z =========================== short test summary info ============================
2025-12-04T13:38:32.2841570Z FAILED [7.6167s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.2841616Z Traceback (most recent call last):
2025-12-04T13:38:32.2841780Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2841821Z     getattr(self, test_name)()
2025-12-04T13:38:32.2841981Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2842015Z     fn()
2025-12-04T13:38:32.2842167Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2842205Z     method(*args, **kwargs)
2025-12-04T13:38:32.2842373Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2842412Z     method(*args, **kwargs)
2025-12-04T13:38:32.2842563Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2842611Z     with policy():
2025-12-04T13:38:32.2842761Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2842801Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2843196Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 18944 on device 0. CUDA driver allocated memory was 2453667840 and is now 3571449856.
2025-12-04T13:38:32.2843199Z 
2025-12-04T13:38:32.2843273Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2843544Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda
2025-12-04T13:38:32.2843546Z 
2025-12-04T13:38:32.2843632Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2843634Z 
2025-12-04T13:38:32.2843690Z Process 1 exited with error code 10 and exception:
2025-12-04T13:38:32.2843735Z Traceback (most recent call last):
2025-12-04T13:38:32.2843895Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2843947Z     getattr(self, test_name)()
2025-12-04T13:38:32.2844105Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2844138Z     fn()
2025-12-04T13:38:32.2844289Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2844328Z     method(*args, **kwargs)
2025-12-04T13:38:32.2844478Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2844518Z     method(*args, **kwargs)
2025-12-04T13:38:32.2844667Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2844702Z     with policy():
2025-12-04T13:38:32.2844852Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2844901Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2845291Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 23040 on device 1. CUDA driver allocated memory was 2317352960 and is now 3435134976.
2025-12-04T13:38:32.2845293Z 
2025-12-04T13:38:32.2845365Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2845637Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda
2025-12-04T13:38:32.2845639Z 
2025-12-04T13:38:32.2845724Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2845787Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.2845848Z ======================= 1 failed, 32 deselected in 7.77s =======================
2025-12-04T13:38:32.2845885Z Got exit code 1
2025-12-04T13:38:32.2845924Z Retrying single test...
2025-12-04T13:38:32.2846125Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-f99384b36d0e58a6.xml
2025-12-04T13:38:32.2846186Z ============================= test session starts ==============================
2025-12-04T13:38:32.2846298Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.2846351Z cachedir: .pytest_cache
2025-12-04T13:38:32.2846510Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.2846555Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.2846593Z configfile: pytest.ini
2025-12-04T13:38:32.2846759Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.2846831Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.2847099Z stepcurrent: skipping 28 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda
2025-12-04T13:38:32.2847141Z Running 1 items in this shard
2025-12-04T13:38:32.2847143Z 
2025-12-04T13:38:32.2847490Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda I1204 13:35:58.654000 431283 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 431352
2025-12-04T13:38:32.2847645Z I1204 13:35:58.654000 431283 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 431353
2025-12-04T13:38:32.2847807Z I1204 13:35:58.655000 431283 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 431354
2025-12-04T13:38:32.2847960Z I1204 13:35:58.656000 431283 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 431355
2025-12-04T13:38:32.2848540Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2848579Z   _warn_cpu_init()
2025-12-04T13:38:32.2849160Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2849199Z   _warn_cpu_init()
2025-12-04T13:38:32.2849797Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2849833Z   _warn_cpu_init()
2025-12-04T13:38:32.2850127Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.2850169Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2850747Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2850796Z   _warn_cpu_init()
2025-12-04T13:38:32.2850938Z [rank1]:E1204 13:36:04.543000 431353 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2851101Z [rank1]:E1204 13:36:04.543000 431353 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2851393Z [rank1]:E1204 13:36:04.543000 431353 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2851550Z [rank1]:E1204 13:36:04.543000 431353 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2851835Z [rank1]:E1204 13:36:04.543000 431353 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2851959Z [rank1]:E1204 13:36:04.543000 431353 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2852242Z [rank1]:E1204 13:36:04.543000 431353 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2852403Z [rank1]:E1204 13:36:04.543000 431353 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2852681Z [rank1]:E1204 13:36:04.543000 431353 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2852829Z [rank1]:E1204 13:36:04.543000 431353 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2853107Z [rank1]:E1204 13:36:04.543000 431353 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2853243Z [rank1]:E1204 13:36:04.543000 431353 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2853534Z [rank1]:E1204 13:36:04.543000 431353 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2853682Z [rank1]:E1204 13:36:04.543000 431353 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2854197Z [rank1]:E1204 13:36:04.543000 431353 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 20992 on device 1. CUDA driver allocated memory was 2317352960 and is now 3435134976.
2025-12-04T13:38:32.2854312Z [rank1]:E1204 13:36:04.543000 431353 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2854507Z [rank1]:E1204 13:36:04.543000 431353 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2854913Z [rank1]:E1204 13:36:04.543000 431353 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda
2025-12-04T13:38:32.2855035Z [rank1]:E1204 13:36:04.543000 431353 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2855244Z [rank1]:E1204 13:36:04.543000 431353 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2855408Z [rank1]:E1204 13:36:04.543000 431353 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.2855449Z dist init r=1, world=4
2025-12-04T13:38:32.2855585Z [rank3]:E1204 13:36:04.589000 431355 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2855744Z [rank3]:E1204 13:36:04.589000 431355 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2856033Z [rank3]:E1204 13:36:04.589000 431355 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2856186Z [rank3]:E1204 13:36:04.589000 431355 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2856472Z [rank3]:E1204 13:36:04.589000 431355 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2856610Z [rank3]:E1204 13:36:04.589000 431355 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2856887Z [rank3]:E1204 13:36:04.589000 431355 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2857032Z [rank3]:E1204 13:36:04.589000 431355 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2857311Z [rank3]:E1204 13:36:04.589000 431355 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2857459Z [rank3]:E1204 13:36:04.589000 431355 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2857746Z [rank3]:E1204 13:36:04.589000 431355 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2857882Z [rank3]:E1204 13:36:04.589000 431355 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2858158Z [rank3]:E1204 13:36:04.589000 431355 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2858306Z [rank3]:E1204 13:36:04.589000 431355 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2858822Z [rank3]:E1204 13:36:04.589000 431355 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 2250244096 and is now 3368026112.
2025-12-04T13:38:32.2858946Z [rank3]:E1204 13:36:04.589000 431355 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2859140Z [rank3]:E1204 13:36:04.589000 431355 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2859547Z [rank3]:E1204 13:36:04.589000 431355 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda
2025-12-04T13:38:32.2859701Z [rank3]:E1204 13:36:04.589000 431355 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2859911Z [rank3]:E1204 13:36:04.589000 431355 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2860076Z [rank3]:E1204 13:36:04.589000 431355 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.2860113Z dist init r=3, world=4
2025-12-04T13:38:32.2860250Z [rank0]:E1204 13:36:04.590000 431352 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2860409Z [rank0]:E1204 13:36:04.590000 431352 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2860698Z [rank0]:E1204 13:36:04.590000 431352 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2860868Z [rank0]:E1204 13:36:04.590000 431352 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2861151Z [rank0]:E1204 13:36:04.590000 431352 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2861273Z [rank0]:E1204 13:36:04.590000 431352 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2861548Z [rank0]:E1204 13:36:04.590000 431352 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2861694Z [rank0]:E1204 13:36:04.590000 431352 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2861982Z [rank0]:E1204 13:36:04.590000 431352 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2862128Z [rank0]:E1204 13:36:04.590000 431352 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2862402Z [rank0]:E1204 13:36:04.590000 431352 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2862537Z [rank0]:E1204 13:36:04.590000 431352 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2862817Z [rank0]:E1204 13:36:04.590000 431352 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2862965Z [rank0]:E1204 13:36:04.590000 431352 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2863492Z [rank0]:E1204 13:36:04.590000 431352 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 16896 on device 0. CUDA driver allocated memory was 2453667840 and is now 3571449856.
2025-12-04T13:38:32.2863619Z [rank0]:E1204 13:36:04.590000 431352 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2863811Z [rank0]:E1204 13:36:04.590000 431352 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2864212Z [rank0]:E1204 13:36:04.590000 431352 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda
2025-12-04T13:38:32.2864325Z [rank0]:E1204 13:36:04.590000 431352 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2864535Z [rank0]:E1204 13:36:04.590000 431352 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2864698Z [rank0]:E1204 13:36:04.590000 431352 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.2864738Z dist init r=0, world=4
2025-12-04T13:38:32.2864873Z [rank2]:E1204 13:36:04.599000 431354 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2865043Z [rank2]:E1204 13:36:04.599000 431354 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2865333Z [rank2]:E1204 13:36:04.599000 431354 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2865485Z [rank2]:E1204 13:36:04.599000 431354 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2865768Z [rank2]:E1204 13:36:04.599000 431354 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2865890Z [rank2]:E1204 13:36:04.599000 431354 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2866176Z [rank2]:E1204 13:36:04.599000 431354 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2866321Z [rank2]:E1204 13:36:04.599000 431354 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2866598Z [rank2]:E1204 13:36:04.599000 431354 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2866744Z [rank2]:E1204 13:36:04.599000 431354 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2867019Z [rank2]:E1204 13:36:04.599000 431354 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2867157Z [rank2]:E1204 13:36:04.599000 431354 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2867444Z [rank2]:E1204 13:36:04.599000 431354 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2867592Z [rank2]:E1204 13:36:04.599000 431354 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2868116Z [rank2]:E1204 13:36:04.599000 431354 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 20992 on device 2. CUDA driver allocated memory was 2300575744 and is now 3418357760.
2025-12-04T13:38:32.2868232Z [rank2]:E1204 13:36:04.599000 431354 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2868430Z [rank2]:E1204 13:36:04.599000 431354 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2868829Z [rank2]:E1204 13:36:04.599000 431354 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda
2025-12-04T13:38:32.2868942Z [rank2]:E1204 13:36:04.599000 431354 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2869151Z [rank2]:E1204 13:36:04.599000 431354 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2869324Z [rank2]:E1204 13:36:04.599000 431354 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.2869361Z dist init r=2, world=4
2025-12-04T13:38:32.2869721Z [rank0]:[W1204 13:36:04.856590047 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.2869759Z FAILED [7.5162s] [100%]
2025-12-04T13:38:32.2869763Z 
2025-12-04T13:38:32.2869818Z =================================== FAILURES ===================================
2025-12-04T13:38:32.2869958Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda _
2025-12-04T13:38:32.2870003Z Traceback (most recent call last):
2025-12-04T13:38:32.2870168Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.2870227Z     self._join_processes(fn)
2025-12-04T13:38:32.2870401Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.2870453Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.2870632Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.2870675Z     raise RuntimeError(error)
2025-12-04T13:38:32.2870755Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T13:38:32.2870798Z Traceback (most recent call last):
2025-12-04T13:38:32.2870960Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2871002Z     getattr(self, test_name)()
2025-12-04T13:38:32.2871163Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2871196Z     fn()
2025-12-04T13:38:32.2871346Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2871387Z     method(*args, **kwargs)
2025-12-04T13:38:32.2871550Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2871589Z     method(*args, **kwargs)
2025-12-04T13:38:32.2871758Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2871794Z     with policy():
2025-12-04T13:38:32.2871948Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2871987Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2872383Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 20992 on device 1. CUDA driver allocated memory was 2317352960 and is now 3435134976.
2025-12-04T13:38:32.2872386Z 
2025-12-04T13:38:32.2872462Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2872735Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda
2025-12-04T13:38:32.2872738Z 
2025-12-04T13:38:32.2872826Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2872829Z 
2025-12-04T13:38:32.2872830Z 
2025-12-04T13:38:32.2872903Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.2873008Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.2873242Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-f99384b36d0e58a6.xml -
2025-12-04T13:38:32.2873302Z =========================== short test summary info ============================
2025-12-04T13:38:32.2873592Z FAILED [7.5162s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T13:38:32.2873637Z Traceback (most recent call last):
2025-12-04T13:38:32.2873801Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2873842Z     getattr(self, test_name)()
2025-12-04T13:38:32.2874002Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2874035Z     fn()
2025-12-04T13:38:32.2874199Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2874237Z     method(*args, **kwargs)
2025-12-04T13:38:32.2874390Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2874428Z     method(*args, **kwargs)
2025-12-04T13:38:32.2874579Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2874617Z     with policy():
2025-12-04T13:38:32.2874770Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2874809Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2875203Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 20992 on device 1. CUDA driver allocated memory was 2317352960 and is now 3435134976.
2025-12-04T13:38:32.2875206Z 
2025-12-04T13:38:32.2875278Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2875562Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda
2025-12-04T13:38:32.2875573Z 
2025-12-04T13:38:32.2875659Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2875723Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.2875786Z ======================= 1 failed, 32 deselected in 7.68s =======================
2025-12-04T13:38:32.2878374Z Got exit code 1
2025-12-04T13:38:32.2878604Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda
2025-12-04T13:38:32.2878734Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T13:38:32.2878922Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-04695ebb741ba6d1.xml
2025-12-04T13:38:32.2878981Z ============================= test session starts ==============================
2025-12-04T13:38:32.2879095Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.2879137Z cachedir: .pytest_cache
2025-12-04T13:38:32.2879295Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.2879341Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.2879398Z configfile: pytest.ini
2025-12-04T13:38:32.2879561Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.2879671Z collecting ... collected 60 items / 29 deselected / 31 selected
2025-12-04T13:38:32.2879724Z stepcurrent: skipping 29 already run items.
2025-12-04T13:38:32.2879767Z Running 4 items in this shard
2025-12-04T13:38:32.2879770Z 
2025-12-04T13:38:32.2880131Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda I1204 13:36:08.755000 431685 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 431754
2025-12-04T13:38:32.2880288Z I1204 13:36:08.756000 431685 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 431755
2025-12-04T13:38:32.2880439Z I1204 13:36:08.756000 431685 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 431756
2025-12-04T13:38:32.2880611Z I1204 13:36:08.757000 431685 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 431757
2025-12-04T13:38:32.2881195Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2881234Z   _warn_cpu_init()
2025-12-04T13:38:32.2881806Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2881845Z   _warn_cpu_init()
2025-12-04T13:38:32.2882149Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.2882191Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2882778Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2882815Z   _warn_cpu_init()
2025-12-04T13:38:32.2883386Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2883423Z   _warn_cpu_init()
2025-12-04T13:38:32.2883565Z [rank3]:E1204 13:36:14.649000 431757 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2883726Z [rank3]:E1204 13:36:14.649000 431757 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2884028Z [rank3]:E1204 13:36:14.649000 431757 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2884184Z [rank3]:E1204 13:36:14.649000 431757 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2884469Z [rank3]:E1204 13:36:14.649000 431757 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2884595Z [rank3]:E1204 13:36:14.649000 431757 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2884872Z [rank3]:E1204 13:36:14.649000 431757 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2885030Z [rank3]:E1204 13:36:14.649000 431757 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2885307Z [rank3]:E1204 13:36:14.649000 431757 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2885452Z [rank3]:E1204 13:36:14.649000 431757 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2885728Z [rank3]:E1204 13:36:14.649000 431757 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2885862Z [rank3]:E1204 13:36:14.649000 431757 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2886142Z [rank3]:E1204 13:36:14.649000 431757 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2886289Z [rank3]:E1204 13:36:14.649000 431757 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2886837Z [rank3]:E1204 13:36:14.649000 431757 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 23040 on device 3. CUDA driver allocated memory was 2250244096 and is now 3368026112.
2025-12-04T13:38:32.2886961Z [rank3]:E1204 13:36:14.649000 431757 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2887158Z [rank3]:E1204 13:36:14.649000 431757 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2887574Z [rank3]:E1204 13:36:14.649000 431757 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2887689Z [rank3]:E1204 13:36:14.649000 431757 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2887901Z [rank3]:E1204 13:36:14.649000 431757 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2888065Z [rank3]:E1204 13:36:14.649000 431757 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.2888114Z dist init r=3, world=4
2025-12-04T13:38:32.2888252Z [rank2]:E1204 13:36:14.651000 431756 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2888412Z [rank2]:E1204 13:36:14.651000 431756 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2888702Z [rank2]:E1204 13:36:14.651000 431756 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2888857Z [rank2]:E1204 13:36:14.651000 431756 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2889141Z [rank2]:E1204 13:36:14.651000 431756 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2889275Z [rank2]:E1204 13:36:14.651000 431756 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2889552Z [rank2]:E1204 13:36:14.651000 431756 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2889717Z [rank2]:E1204 13:36:14.651000 431756 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2889991Z [rank2]:E1204 13:36:14.651000 431756 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2890137Z [rank2]:E1204 13:36:14.651000 431756 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2890411Z [rank2]:E1204 13:36:14.651000 431756 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2890548Z [rank2]:E1204 13:36:14.651000 431756 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2890845Z [rank2]:E1204 13:36:14.651000 431756 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2891005Z [rank2]:E1204 13:36:14.651000 431756 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2891541Z [rank2]:E1204 13:36:14.651000 431756 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 25088 on device 2. CUDA driver allocated memory was 2300575744 and is now 3418357760.
2025-12-04T13:38:32.2891655Z [rank2]:E1204 13:36:14.651000 431756 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2891851Z [rank2]:E1204 13:36:14.651000 431756 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2892261Z [rank2]:E1204 13:36:14.651000 431756 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2892374Z [rank2]:E1204 13:36:14.651000 431756 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2892601Z [rank2]:E1204 13:36:14.651000 431756 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2892763Z [rank2]:E1204 13:36:14.651000 431756 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.2892803Z dist init r=2, world=4
2025-12-04T13:38:32.2892938Z [rank0]:E1204 13:36:14.716000 431754 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2893098Z [rank0]:E1204 13:36:14.716000 431754 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2893384Z [rank0]:E1204 13:36:14.716000 431754 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2893551Z [rank0]:E1204 13:36:14.716000 431754 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2893835Z [rank0]:E1204 13:36:14.716000 431754 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2893958Z [rank0]:E1204 13:36:14.716000 431754 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2894232Z [rank0]:E1204 13:36:14.716000 431754 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2894378Z [rank0]:E1204 13:36:14.716000 431754 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2894654Z [rank0]:E1204 13:36:14.716000 431754 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2894799Z [rank0]:E1204 13:36:14.716000 431754 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2895083Z [rank0]:E1204 13:36:14.716000 431754 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2895225Z [rank0]:E1204 13:36:14.716000 431754 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2895500Z [rank0]:E1204 13:36:14.716000 431754 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2895649Z [rank0]:E1204 13:36:14.716000 431754 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2896178Z [rank0]:E1204 13:36:14.716000 431754 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 14848 on device 0. CUDA driver allocated memory was 2453667840 and is now 3571449856.
2025-12-04T13:38:32.2896291Z [rank0]:E1204 13:36:14.716000 431754 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2896485Z [rank0]:E1204 13:36:14.716000 431754 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2896908Z [rank0]:E1204 13:36:14.716000 431754 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2897020Z [rank0]:E1204 13:36:14.716000 431754 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2897229Z [rank0]:E1204 13:36:14.716000 431754 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2897393Z [rank0]:E1204 13:36:14.716000 431754 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.2897430Z dist init r=0, world=4
2025-12-04T13:38:32.2897566Z [rank1]:E1204 13:36:14.729000 431755 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2897734Z [rank1]:E1204 13:36:14.729000 431755 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2898020Z [rank1]:E1204 13:36:14.729000 431755 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2898174Z [rank1]:E1204 13:36:14.729000 431755 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2898459Z [rank1]:E1204 13:36:14.729000 431755 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2898580Z [rank1]:E1204 13:36:14.729000 431755 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2898857Z [rank1]:E1204 13:36:14.729000 431755 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2899002Z [rank1]:E1204 13:36:14.729000 431755 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2899285Z [rank1]:E1204 13:36:14.729000 431755 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2899442Z [rank1]:E1204 13:36:14.729000 431755 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2899761Z [rank1]:E1204 13:36:14.729000 431755 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2899899Z [rank1]:E1204 13:36:14.729000 431755 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2900177Z [rank1]:E1204 13:36:14.729000 431755 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2900324Z [rank1]:E1204 13:36:14.729000 431755 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2900855Z [rank1]:E1204 13:36:14.729000 431755 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 20992 on device 1. CUDA driver allocated memory was 2317352960 and is now 3435134976.
2025-12-04T13:38:32.2900982Z [rank1]:E1204 13:36:14.729000 431755 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2901177Z [rank1]:E1204 13:36:14.729000 431755 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2901586Z [rank1]:E1204 13:36:14.729000 431755 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2901699Z [rank1]:E1204 13:36:14.729000 431755 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2901909Z [rank1]:E1204 13:36:14.729000 431755 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2902087Z [rank1]:E1204 13:36:14.729000 431755 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.2902125Z dist init r=1, world=4
2025-12-04T13:38:32.2902459Z [rank0]:[W1204 13:36:15.087724731 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.2902498Z FAILED [7.5158s] [ 25%]
2025-12-04T13:38:32.2902501Z 
2025-12-04T13:38:32.2902557Z =================================== FAILURES ===================================
2025-12-04T13:38:32.2902707Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda _
2025-12-04T13:38:32.2902752Z Traceback (most recent call last):
2025-12-04T13:38:32.2902915Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.2902959Z     self._join_processes(fn)
2025-12-04T13:38:32.2903131Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.2903184Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.2903375Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.2903418Z     raise RuntimeError(error)
2025-12-04T13:38:32.2903510Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.2903554Z Traceback (most recent call last):
2025-12-04T13:38:32.2903715Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2903755Z     getattr(self, test_name)()
2025-12-04T13:38:32.2903916Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2903950Z     fn()
2025-12-04T13:38:32.2904101Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2904141Z     method(*args, **kwargs)
2025-12-04T13:38:32.2904293Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2904333Z     method(*args, **kwargs)
2025-12-04T13:38:32.2904485Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2904521Z     with policy():
2025-12-04T13:38:32.2904673Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2904713Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2905125Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 23040 on device 3. CUDA driver allocated memory was 2250244096 and is now 3368026112.
2025-12-04T13:38:32.2905128Z 
2025-12-04T13:38:32.2905203Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2905488Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2905491Z 
2025-12-04T13:38:32.2905579Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2905581Z 
2025-12-04T13:38:32.2905583Z 
2025-12-04T13:38:32.2905659Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.2905746Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.2905994Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-04695ebb741ba6d1.xml -
2025-12-04T13:38:32.2906055Z =========================== short test summary info ============================
2025-12-04T13:38:32.2906350Z FAILED [7.5158s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.2906395Z Traceback (most recent call last):
2025-12-04T13:38:32.2906558Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2906599Z     getattr(self, test_name)()
2025-12-04T13:38:32.2906759Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2906795Z     fn()
2025-12-04T13:38:32.2906949Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2906988Z     method(*args, **kwargs)
2025-12-04T13:38:32.2907149Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2907188Z     method(*args, **kwargs)
2025-12-04T13:38:32.2907340Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2907387Z     with policy():
2025-12-04T13:38:32.2907538Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2907578Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2907978Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 23040 on device 3. CUDA driver allocated memory was 2250244096 and is now 3368026112.
2025-12-04T13:38:32.2907981Z 
2025-12-04T13:38:32.2908054Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2908335Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2908338Z 
2025-12-04T13:38:32.2908424Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2908486Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.2908547Z ======================= 1 failed, 29 deselected in 7.68s =======================
2025-12-04T13:38:32.2908593Z Got exit code 1
2025-12-04T13:38:32.2908633Z Retrying single test...
2025-12-04T13:38:32.2908821Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-bcef664f761bd15e.xml
2025-12-04T13:38:32.2908879Z ============================= test session starts ==============================
2025-12-04T13:38:32.2908991Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.2909031Z cachedir: .pytest_cache
2025-12-04T13:38:32.2909189Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.2909235Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.2909274Z configfile: pytest.ini
2025-12-04T13:38:32.2909436Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.2909511Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.2909839Z stepcurrent: skipping 29 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2909883Z Running 1 items in this shard
2025-12-04T13:38:32.2909885Z 
2025-12-04T13:38:32.2910239Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda I1204 13:36:19.003000 432087 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 432156
2025-12-04T13:38:32.2910395Z I1204 13:36:19.004000 432087 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 432157
2025-12-04T13:38:32.2910545Z I1204 13:36:19.005000 432087 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 432158
2025-12-04T13:38:32.2910696Z I1204 13:36:19.005000 432087 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 432159
2025-12-04T13:38:32.2911296Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2911346Z   _warn_cpu_init()
2025-12-04T13:38:32.2911921Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2911958Z   _warn_cpu_init()
2025-12-04T13:38:32.2912526Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2912563Z   _warn_cpu_init()
2025-12-04T13:38:32.2913128Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2913177Z   _warn_cpu_init()
2025-12-04T13:38:32.2913467Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.2913509Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2913650Z [rank2]:E1204 13:36:24.924000 432158 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2913811Z [rank2]:E1204 13:36:24.924000 432158 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2914111Z [rank2]:E1204 13:36:24.924000 432158 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2914268Z [rank2]:E1204 13:36:24.924000 432158 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2914554Z [rank2]:E1204 13:36:24.924000 432158 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2914678Z [rank2]:E1204 13:36:24.924000 432158 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2914957Z [rank2]:E1204 13:36:24.924000 432158 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2915106Z [rank2]:E1204 13:36:24.924000 432158 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2915386Z [rank2]:E1204 13:36:24.924000 432158 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2915542Z [rank2]:E1204 13:36:24.924000 432158 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2915815Z [rank2]:E1204 13:36:24.924000 432158 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2915961Z [rank2]:E1204 13:36:24.924000 432158 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2916236Z [rank2]:E1204 13:36:24.924000 432158 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2916385Z [rank2]:E1204 13:36:24.924000 432158 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2916914Z [rank2]:E1204 13:36:24.924000 432158 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 25088 on device 2. CUDA driver allocated memory was 2300575744 and is now 3418357760.
2025-12-04T13:38:32.2917030Z [rank2]:E1204 13:36:24.924000 432158 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2917227Z [rank2]:E1204 13:36:24.924000 432158 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2917646Z [rank2]:E1204 13:36:24.924000 432158 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2917758Z [rank2]:E1204 13:36:24.924000 432158 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2917967Z [rank2]:E1204 13:36:24.924000 432158 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2918132Z [rank2]:E1204 13:36:24.924000 432158 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.2918170Z dist init r=2, world=4
2025-12-04T13:38:32.2918316Z [rank0]:E1204 13:36:24.931000 432156 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2918474Z [rank0]:E1204 13:36:24.931000 432156 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2918763Z [rank0]:E1204 13:36:24.931000 432156 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2918916Z [rank0]:E1204 13:36:24.931000 432156 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2919198Z [rank0]:E1204 13:36:24.931000 432156 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2919323Z [rank0]:E1204 13:36:24.931000 432156 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2919674Z [rank0]:E1204 13:36:24.931000 432156 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2919835Z [rank0]:E1204 13:36:24.931000 432156 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2920110Z [rank0]:E1204 13:36:24.931000 432156 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2920271Z [rank0]:E1204 13:36:24.931000 432156 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2920547Z [rank0]:E1204 13:36:24.931000 432156 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2920681Z [rank0]:E1204 13:36:24.931000 432156 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2920959Z [rank0]:E1204 13:36:24.931000 432156 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2921106Z [rank0]:E1204 13:36:24.931000 432156 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2921639Z [rank0]:E1204 13:36:24.931000 432156 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 2453667840 and is now 3571449856.
2025-12-04T13:38:32.2921773Z [rank0]:E1204 13:36:24.931000 432156 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2921966Z [rank0]:E1204 13:36:24.931000 432156 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2922375Z [rank0]:E1204 13:36:24.931000 432156 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2922488Z [rank0]:E1204 13:36:24.931000 432156 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2922715Z [rank0]:E1204 13:36:24.931000 432156 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2922877Z [rank0]:E1204 13:36:24.931000 432156 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.2922915Z dist init r=0, world=4
2025-12-04T13:38:32.2923052Z [rank3]:E1204 13:36:24.965000 432159 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2923210Z [rank3]:E1204 13:36:24.965000 432159 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2923501Z [rank3]:E1204 13:36:24.965000 432159 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2923656Z [rank3]:E1204 13:36:24.965000 432159 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2923938Z [rank3]:E1204 13:36:24.965000 432159 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2924069Z [rank3]:E1204 13:36:24.965000 432159 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2924344Z [rank3]:E1204 13:36:24.965000 432159 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2924500Z [rank3]:E1204 13:36:24.965000 432159 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2924776Z [rank3]:E1204 13:36:24.965000 432159 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2924922Z [rank3]:E1204 13:36:24.965000 432159 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2925196Z [rank3]:E1204 13:36:24.965000 432159 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2925333Z [rank3]:E1204 13:36:24.965000 432159 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2925609Z [rank3]:E1204 13:36:24.965000 432159 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2925769Z [rank3]:E1204 13:36:24.965000 432159 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2926294Z [rank3]:E1204 13:36:24.965000 432159 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 20992 on device 3. CUDA driver allocated memory was 2250244096 and is now 3368026112.
2025-12-04T13:38:32.2926408Z [rank3]:E1204 13:36:24.965000 432159 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2926602Z [rank3]:E1204 13:36:24.965000 432159 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2927020Z [rank3]:E1204 13:36:24.965000 432159 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2927133Z [rank3]:E1204 13:36:24.965000 432159 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2927344Z [rank3]:E1204 13:36:24.965000 432159 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2927508Z [rank3]:E1204 13:36:24.965000 432159 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.2927546Z dist init r=3, world=4
2025-12-04T13:38:32.2927682Z [rank1]:E1204 13:36:24.976000 432157 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2927839Z [rank1]:E1204 13:36:24.976000 432157 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2928130Z [rank1]:E1204 13:36:24.976000 432157 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2928293Z [rank1]:E1204 13:36:24.976000 432157 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2928575Z [rank1]:E1204 13:36:24.976000 432157 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2928707Z [rank1]:E1204 13:36:24.976000 432157 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2928982Z [rank1]:E1204 13:36:24.976000 432157 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2929129Z [rank1]:E1204 13:36:24.976000 432157 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2929408Z [rank1]:E1204 13:36:24.976000 432157 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2929554Z [rank1]:E1204 13:36:24.976000 432157 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2929855Z [rank1]:E1204 13:36:24.976000 432157 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2930006Z [rank1]:E1204 13:36:24.976000 432157 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2930282Z [rank1]:E1204 13:36:24.976000 432157 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2930433Z [rank1]:E1204 13:36:24.976000 432157 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2930961Z [rank1]:E1204 13:36:24.976000 432157 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 20992 on device 1. CUDA driver allocated memory was 2317352960 and is now 3435134976.
2025-12-04T13:38:32.2931075Z [rank1]:E1204 13:36:24.976000 432157 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2931284Z [rank1]:E1204 13:36:24.976000 432157 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2931694Z [rank1]:E1204 13:36:24.976000 432157 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2931806Z [rank1]:E1204 13:36:24.976000 432157 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2932015Z [rank1]:E1204 13:36:24.976000 432157 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2932179Z [rank1]:E1204 13:36:24.976000 432157 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.2932216Z dist init r=1, world=4
2025-12-04T13:38:32.2932562Z [rank0]:[W1204 13:36:25.125010451 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.2932601Z FAILED [7.6174s] [100%]
2025-12-04T13:38:32.2932603Z 
2025-12-04T13:38:32.2932658Z =================================== FAILURES ===================================
2025-12-04T13:38:32.2932821Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda _
2025-12-04T13:38:32.2932868Z Traceback (most recent call last):
2025-12-04T13:38:32.2933030Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.2933074Z     self._join_processes(fn)
2025-12-04T13:38:32.2933246Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.2933299Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.2933476Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.2933519Z     raise RuntimeError(error)
2025-12-04T13:38:32.2933597Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.2933642Z Traceback (most recent call last):
2025-12-04T13:38:32.2933801Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2933843Z     getattr(self, test_name)()
2025-12-04T13:38:32.2933999Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2934044Z     fn()
2025-12-04T13:38:32.2934194Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2934235Z     method(*args, **kwargs)
2025-12-04T13:38:32.2934384Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2934424Z     method(*args, **kwargs)
2025-12-04T13:38:32.2934574Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2934613Z     with policy():
2025-12-04T13:38:32.2934765Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2934805Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2935216Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 2453667840 and is now 3571449856.
2025-12-04T13:38:32.2935220Z 
2025-12-04T13:38:32.2935294Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2935583Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2935586Z 
2025-12-04T13:38:32.2935672Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2935674Z 
2025-12-04T13:38:32.2935732Z Process 2 exited with error code 10 and exception:
2025-12-04T13:38:32.2935775Z Traceback (most recent call last):
2025-12-04T13:38:32.2935938Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2935980Z     getattr(self, test_name)()
2025-12-04T13:38:32.2936140Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2936174Z     fn()
2025-12-04T13:38:32.2936334Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2936373Z     method(*args, **kwargs)
2025-12-04T13:38:32.2936525Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2936581Z     method(*args, **kwargs)
2025-12-04T13:38:32.2936730Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2936766Z     with policy():
2025-12-04T13:38:32.2936918Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2936961Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2937361Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 25088 on device 2. CUDA driver allocated memory was 2300575744 and is now 3418357760.
2025-12-04T13:38:32.2937363Z 
2025-12-04T13:38:32.2937437Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2937724Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2937726Z 
2025-12-04T13:38:32.2937812Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2937827Z 
2025-12-04T13:38:32.2937828Z 
2025-12-04T13:38:32.2937904Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.2937994Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.2938229Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-bcef664f761bd15e.xml -
2025-12-04T13:38:32.2938293Z =========================== short test summary info ============================
2025-12-04T13:38:32.2938587Z FAILED [7.6174s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.2938635Z Traceback (most recent call last):
2025-12-04T13:38:32.2938798Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2938844Z     getattr(self, test_name)()
2025-12-04T13:38:32.2939016Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2939055Z     fn()
2025-12-04T13:38:32.2939210Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2939254Z     method(*args, **kwargs)
2025-12-04T13:38:32.2939408Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2939451Z     method(*args, **kwargs)
2025-12-04T13:38:32.2939641Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2939682Z     with policy():
2025-12-04T13:38:32.2939838Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2939880Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2940292Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 2453667840 and is now 3571449856.
2025-12-04T13:38:32.2940295Z 
2025-12-04T13:38:32.2940369Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2940670Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2940673Z 
2025-12-04T13:38:32.2940759Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2940761Z 
2025-12-04T13:38:32.2940824Z Process 2 exited with error code 10 and exception:
2025-12-04T13:38:32.2940869Z Traceback (most recent call last):
2025-12-04T13:38:32.2941033Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2941074Z     getattr(self, test_name)()
2025-12-04T13:38:32.2941237Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2941274Z     fn()
2025-12-04T13:38:32.2941424Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2941467Z     method(*args, **kwargs)
2025-12-04T13:38:32.2941617Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2941658Z     method(*args, **kwargs)
2025-12-04T13:38:32.2941808Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2941865Z     with policy():
2025-12-04T13:38:32.2942018Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2942061Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2942459Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 25088 on device 2. CUDA driver allocated memory was 2300575744 and is now 3418357760.
2025-12-04T13:38:32.2942462Z 
2025-12-04T13:38:32.2942538Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2942819Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2942822Z 
2025-12-04T13:38:32.2942923Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2942988Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.2943048Z ======================= 1 failed, 32 deselected in 7.78s =======================
2025-12-04T13:38:32.2943089Z Got exit code 1
2025-12-04T13:38:32.2943130Z Retrying single test...
2025-12-04T13:38:32.2943324Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-cb3378da80874a3f.xml
2025-12-04T13:38:32.2943383Z ============================= test session starts ==============================
2025-12-04T13:38:32.2943497Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.2943537Z cachedir: .pytest_cache
2025-12-04T13:38:32.2943698Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.2943746Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.2943790Z configfile: pytest.ini
2025-12-04T13:38:32.2943953Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.2944039Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.2944319Z stepcurrent: skipping 29 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2944377Z Running 1 items in this shard
2025-12-04T13:38:32.2944379Z 
2025-12-04T13:38:32.2944734Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda I1204 13:36:29.118000 432489 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 432558
2025-12-04T13:38:32.2944892Z I1204 13:36:29.119000 432489 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 432559
2025-12-04T13:38:32.2945048Z I1204 13:36:29.119000 432489 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 432560
2025-12-04T13:38:32.2945200Z I1204 13:36:29.120000 432489 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 432561
2025-12-04T13:38:32.2945785Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2945833Z   _warn_cpu_init()
2025-12-04T13:38:32.2946406Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2946446Z   _warn_cpu_init()
2025-12-04T13:38:32.2946742Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.2946787Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2947369Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2947409Z   _warn_cpu_init()
2025-12-04T13:38:32.2947974Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2948015Z   _warn_cpu_init()
2025-12-04T13:38:32.2948161Z [rank1]:E1204 13:36:35.113000 432559 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2948321Z [rank1]:E1204 13:36:35.113000 432559 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2948623Z [rank1]:E1204 13:36:35.113000 432559 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2948791Z [rank1]:E1204 13:36:35.113000 432559 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2949082Z [rank1]:E1204 13:36:35.113000 432559 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2949208Z [rank1]:E1204 13:36:35.113000 432559 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2949492Z [rank1]:E1204 13:36:35.113000 432559 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2949683Z [rank1]:E1204 13:36:35.113000 432559 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2949962Z [rank1]:E1204 13:36:35.113000 432559 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2950111Z [rank1]:E1204 13:36:35.113000 432559 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2950392Z [rank1]:E1204 13:36:35.113000 432559 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2950542Z [rank1]:E1204 13:36:35.113000 432559 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2950820Z [rank1]:E1204 13:36:35.113000 432559 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2950970Z [rank1]:E1204 13:36:35.113000 432559 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2951523Z [rank1]:E1204 13:36:35.113000 432559 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 14848 on device 1. CUDA driver allocated memory was 2317352960 and is now 3435134976.
2025-12-04T13:38:32.2951639Z [rank1]:E1204 13:36:35.113000 432559 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2951836Z [rank1]:E1204 13:36:35.113000 432559 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2952249Z [rank1]:E1204 13:36:35.113000 432559 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2952365Z [rank1]:E1204 13:36:35.113000 432559 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2952577Z [rank1]:E1204 13:36:35.113000 432559 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2952743Z [rank1]:E1204 13:36:35.113000 432559 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.2952784Z dist init r=1, world=4
2025-12-04T13:38:32.2952939Z [rank2]:E1204 13:36:35.142000 432560 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2953113Z [rank2]:E1204 13:36:35.142000 432560 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2953401Z [rank2]:E1204 13:36:35.142000 432560 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2953559Z [rank2]:E1204 13:36:35.142000 432560 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2953843Z [rank2]:E1204 13:36:35.142000 432560 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2953969Z [rank2]:E1204 13:36:35.142000 432560 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2954247Z [rank2]:E1204 13:36:35.142000 432560 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2954395Z [rank2]:E1204 13:36:35.142000 432560 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2954674Z [rank2]:E1204 13:36:35.142000 432560 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2954832Z [rank2]:E1204 13:36:35.142000 432560 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2955111Z [rank2]:E1204 13:36:35.142000 432560 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2955247Z [rank2]:E1204 13:36:35.142000 432560 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2955527Z [rank2]:E1204 13:36:35.142000 432560 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2955684Z [rank2]:E1204 13:36:35.142000 432560 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2956215Z [rank2]:E1204 13:36:35.142000 432560 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 14848 on device 2. CUDA driver allocated memory was 2300575744 and is now 3418357760.
2025-12-04T13:38:32.2956332Z [rank2]:E1204 13:36:35.142000 432560 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2956526Z [rank2]:E1204 13:36:35.142000 432560 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2956937Z [rank2]:E1204 13:36:35.142000 432560 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2957049Z [rank2]:E1204 13:36:35.142000 432560 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2957271Z [rank2]:E1204 13:36:35.142000 432560 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2957446Z [rank2]:E1204 13:36:35.142000 432560 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.2957487Z dist init r=2, world=4
2025-12-04T13:38:32.2957626Z [rank3]:E1204 13:36:35.148000 432561 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2957789Z [rank3]:E1204 13:36:35.148000 432561 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2958080Z [rank3]:E1204 13:36:35.148000 432561 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2958235Z [rank3]:E1204 13:36:35.148000 432561 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2958522Z [rank3]:E1204 13:36:35.148000 432561 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2958647Z [rank3]:E1204 13:36:35.148000 432561 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2958928Z [rank3]:E1204 13:36:35.148000 432561 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2959085Z [rank3]:E1204 13:36:35.148000 432561 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2959363Z [rank3]:E1204 13:36:35.148000 432561 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2959512Z [rank3]:E1204 13:36:35.148000 432561 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2959819Z [rank3]:E1204 13:36:35.148000 432561 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2959970Z [rank3]:E1204 13:36:35.148000 432561 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2960248Z [rank3]:E1204 13:36:35.148000 432561 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2960399Z [rank3]:E1204 13:36:35.148000 432561 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2960931Z [rank3]:E1204 13:36:35.148000 432561 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 25088 on device 3. CUDA driver allocated memory was 2250244096 and is now 3368026112.
2025-12-04T13:38:32.2961046Z [rank3]:E1204 13:36:35.148000 432561 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2961245Z [rank3]:E1204 13:36:35.148000 432561 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2961667Z [rank3]:E1204 13:36:35.148000 432561 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2961795Z [rank3]:E1204 13:36:35.148000 432561 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2962004Z [rank3]:E1204 13:36:35.148000 432561 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2962172Z [rank3]:E1204 13:36:35.148000 432561 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.2962209Z dist init r=3, world=4
2025-12-04T13:38:32.2962347Z [rank0]:E1204 13:36:35.165000 432558 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2962510Z [rank0]:E1204 13:36:35.165000 432558 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2962799Z [rank0]:E1204 13:36:35.165000 432558 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2962956Z [rank0]:E1204 13:36:35.165000 432558 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2963241Z [rank0]:E1204 13:36:35.165000 432558 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2963382Z [rank0]:E1204 13:36:35.165000 432558 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2963661Z [rank0]:E1204 13:36:35.165000 432558 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2963811Z [rank0]:E1204 13:36:35.165000 432558 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2964093Z [rank0]:E1204 13:36:35.165000 432558 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2964248Z [rank0]:E1204 13:36:35.165000 432558 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2964525Z [rank0]:E1204 13:36:35.165000 432558 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2964661Z [rank0]:E1204 13:36:35.165000 432558 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2964940Z [rank0]:E1204 13:36:35.165000 432558 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2965088Z [rank0]:E1204 13:36:35.165000 432558 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2965624Z [rank0]:E1204 13:36:35.165000 432558 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 18944 on device 0. CUDA driver allocated memory was 2453667840 and is now 3571449856.
2025-12-04T13:38:32.2965751Z [rank0]:E1204 13:36:35.165000 432558 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2965946Z [rank0]:E1204 13:36:35.165000 432558 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2966366Z [rank0]:E1204 13:36:35.165000 432558 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2966479Z [rank0]:E1204 13:36:35.165000 432558 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2966687Z [rank0]:E1204 13:36:35.165000 432558 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2966852Z [rank0]:E1204 13:36:35.165000 432558 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.2966892Z dist init r=0, world=4
2025-12-04T13:38:32.2967228Z [rank0]:[W1204 13:36:35.429004777 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.2967266Z FAILED [7.7164s] [100%]
2025-12-04T13:38:32.2967268Z 
2025-12-04T13:38:32.2967326Z =================================== FAILURES ===================================
2025-12-04T13:38:32.2967493Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda _
2025-12-04T13:38:32.2967542Z Traceback (most recent call last):
2025-12-04T13:38:32.2967708Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.2967756Z     self._join_processes(fn)
2025-12-04T13:38:32.2967929Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.2967985Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.2968163Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.2968208Z     raise RuntimeError(error)
2025-12-04T13:38:32.2968287Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T13:38:32.2968336Z Traceback (most recent call last):
2025-12-04T13:38:32.2968515Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2968561Z     getattr(self, test_name)()
2025-12-04T13:38:32.2968721Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2968758Z     fn()
2025-12-04T13:38:32.2968910Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2968954Z     method(*args, **kwargs)
2025-12-04T13:38:32.2969104Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2969148Z     method(*args, **kwargs)
2025-12-04T13:38:32.2969302Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2969340Z     with policy():
2025-12-04T13:38:32.2969498Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2969539Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2969989Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 14848 on device 1. CUDA driver allocated memory was 2317352960 and is now 3435134976.
2025-12-04T13:38:32.2970003Z 
2025-12-04T13:38:32.2970078Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2970365Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2970368Z 
2025-12-04T13:38:32.2970457Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2970459Z 
2025-12-04T13:38:32.2970464Z 
2025-12-04T13:38:32.2970539Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.2970630Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.2970862Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-cb3378da80874a3f.xml -
2025-12-04T13:38:32.2970924Z =========================== short test summary info ============================
2025-12-04T13:38:32.2971226Z FAILED [7.7164s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T13:38:32.2971288Z Traceback (most recent call last):
2025-12-04T13:38:32.2971453Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2971497Z     getattr(self, test_name)()
2025-12-04T13:38:32.2971657Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2971694Z     fn()
2025-12-04T13:38:32.2971847Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2971889Z     method(*args, **kwargs)
2025-12-04T13:38:32.2972041Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2972084Z     method(*args, **kwargs)
2025-12-04T13:38:32.2972235Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2972274Z     with policy():
2025-12-04T13:38:32.2972443Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2972485Z     raise RuntimeError(msg)
2025-12-04T13:38:32.2972885Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 14848 on device 1. CUDA driver allocated memory was 2317352960 and is now 3435134976.
2025-12-04T13:38:32.2972888Z 
2025-12-04T13:38:32.2972964Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2973248Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2973251Z 
2025-12-04T13:38:32.2973338Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2973402Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.2973463Z ======================= 1 failed, 32 deselected in 7.87s =======================
2025-12-04T13:38:32.2973503Z Got exit code 1
2025-12-04T13:38:32.2973745Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.2973885Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T13:38:32.2974070Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5126715a40a2f0a6.xml
2025-12-04T13:38:32.2974128Z ============================= test session starts ==============================
2025-12-04T13:38:32.2974245Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.2974286Z cachedir: .pytest_cache
2025-12-04T13:38:32.2974448Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.2974494Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.2974538Z configfile: pytest.ini
2025-12-04T13:38:32.2974701Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.2974778Z collecting ... collected 60 items / 30 deselected / 30 selected
2025-12-04T13:38:32.2974831Z stepcurrent: skipping 30 already run items.
2025-12-04T13:38:32.2974877Z Running 3 items in this shard
2025-12-04T13:38:32.2974878Z 
2025-12-04T13:38:32.2975177Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_none_cuda I1204 13:36:39.371000 432891 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 432960
2025-12-04T13:38:32.2975344Z I1204 13:36:39.372000 432891 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 432961
2025-12-04T13:38:32.2975495Z I1204 13:36:39.372000 432891 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 432962
2025-12-04T13:38:32.2975649Z I1204 13:36:39.373000 432891 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 432963
2025-12-04T13:38:32.2976012Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:32.2976064Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:32.2976428Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:32.2976477Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:32.2976831Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:32.2976877Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:32.2977233Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:32.2977276Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:32.2977874Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2977916Z   _warn_cpu_init()
2025-12-04T13:38:32.2978486Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2978536Z   _warn_cpu_init()
2025-12-04T13:38:32.2979106Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2979146Z   _warn_cpu_init()
2025-12-04T13:38:32.2979752Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.2979802Z   _warn_cpu_init()
2025-12-04T13:38:32.2980097Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.2980139Z   return func(*args, **kwargs)
2025-12-04T13:38:32.2980285Z [rank1]:E1204 13:36:48.547000 432961 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2980448Z [rank1]:E1204 13:36:48.547000 432961 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2980741Z [rank1]:E1204 13:36:48.547000 432961 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2980912Z [rank1]:E1204 13:36:48.547000 432961 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2981199Z [rank1]:E1204 13:36:48.547000 432961 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2981326Z [rank1]:E1204 13:36:48.547000 432961 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2981602Z [rank1]:E1204 13:36:48.547000 432961 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2981753Z [rank1]:E1204 13:36:48.547000 432961 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2982031Z [rank1]:E1204 13:36:48.547000 432961 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2982181Z [rank1]:E1204 13:36:48.547000 432961 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2982467Z [rank1]:E1204 13:36:48.547000 432961 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2982617Z [rank1]:E1204 13:36:48.547000 432961 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2982896Z [rank1]:E1204 13:36:48.547000 432961 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2983045Z [rank1]:E1204 13:36:48.547000 432961 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2983519Z [rank1]:E1204 13:36:48.547000 432961 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 1. CUDA driver allocated memory was 2317352960 and is now 3919577088.
2025-12-04T13:38:32.2983634Z [rank1]:E1204 13:36:48.547000 432961 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2983832Z [rank1]:E1204 13:36:48.547000 432961 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2984181Z [rank1]:E1204 13:36:48.547000 432961 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda
2025-12-04T13:38:32.2984310Z [rank1]:E1204 13:36:48.547000 432961 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2984524Z [rank1]:E1204 13:36:48.547000 432961 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2984687Z [rank1]:E1204 13:36:48.547000 432961 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.2984731Z dist init r=1, world=4
2025-12-04T13:38:32.2984869Z [rank0]:E1204 13:36:48.548000 432960 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2985029Z [rank0]:E1204 13:36:48.548000 432960 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2985327Z [rank0]:E1204 13:36:48.548000 432960 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2985483Z [rank0]:E1204 13:36:48.548000 432960 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2985767Z [rank0]:E1204 13:36:48.548000 432960 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2985892Z [rank0]:E1204 13:36:48.548000 432960 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2986172Z [rank0]:E1204 13:36:48.548000 432960 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2986321Z [rank0]:E1204 13:36:48.548000 432960 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2986607Z [rank0]:E1204 13:36:48.548000 432960 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2986753Z [rank0]:E1204 13:36:48.548000 432960 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2987040Z [rank0]:E1204 13:36:48.548000 432960 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2987175Z [rank0]:E1204 13:36:48.548000 432960 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2987456Z [rank0]:E1204 13:36:48.548000 432960 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2987607Z [rank0]:E1204 13:36:48.548000 432960 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2988078Z [rank0]:E1204 13:36:48.548000 432960 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 295424 on device 0. CUDA driver allocated memory was 2453667840 and is now 4055891968.
2025-12-04T13:38:32.2988195Z [rank0]:E1204 13:36:48.548000 432960 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2988400Z [rank0]:E1204 13:36:48.548000 432960 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2988757Z [rank0]:E1204 13:36:48.548000 432960 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda
2025-12-04T13:38:32.2988874Z [rank0]:E1204 13:36:48.548000 432960 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2989085Z [rank0]:E1204 13:36:48.548000 432960 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2989252Z [rank0]:E1204 13:36:48.548000 432960 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.2989291Z dist init r=0, world=4
2025-12-04T13:38:32.2989439Z [rank2]:E1204 13:36:48.590000 432962 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2989630Z [rank2]:E1204 13:36:48.590000 432962 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2989920Z [rank2]:E1204 13:36:48.590000 432962 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2990072Z [rank2]:E1204 13:36:48.590000 432962 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2990360Z [rank2]:E1204 13:36:48.590000 432962 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2990486Z [rank2]:E1204 13:36:48.590000 432962 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2990763Z [rank2]:E1204 13:36:48.590000 432962 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2990927Z [rank2]:E1204 13:36:48.590000 432962 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2991217Z [rank2]:E1204 13:36:48.590000 432962 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2991363Z [rank2]:E1204 13:36:48.590000 432962 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2991640Z [rank2]:E1204 13:36:48.590000 432962 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2991777Z [rank2]:E1204 13:36:48.590000 432962 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2992055Z [rank2]:E1204 13:36:48.590000 432962 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2992206Z [rank2]:E1204 13:36:48.590000 432962 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2992677Z [rank2]:E1204 13:36:48.590000 432962 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 2. CUDA driver allocated memory was 2300575744 and is now 3902799872.
2025-12-04T13:38:32.2992802Z [rank2]:E1204 13:36:48.590000 432962 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2992999Z [rank2]:E1204 13:36:48.590000 432962 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2993343Z [rank2]:E1204 13:36:48.590000 432962 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda
2025-12-04T13:38:32.2993459Z [rank2]:E1204 13:36:48.590000 432962 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2993686Z [rank2]:E1204 13:36:48.590000 432962 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2993850Z [rank2]:E1204 13:36:48.590000 432962 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.2993890Z dist init r=2, world=4
2025-12-04T13:38:32.2994028Z [rank3]:E1204 13:36:48.591000 432963 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.2994188Z [rank3]:E1204 13:36:48.591000 432963 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.2994476Z [rank3]:E1204 13:36:48.591000 432963 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.2994632Z [rank3]:E1204 13:36:48.591000 432963 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.2994917Z [rank3]:E1204 13:36:48.591000 432963 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.2995051Z [rank3]:E1204 13:36:48.591000 432963 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.2995330Z [rank3]:E1204 13:36:48.591000 432963 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2995485Z [rank3]:E1204 13:36:48.591000 432963 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2995764Z [rank3]:E1204 13:36:48.591000 432963 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.2995912Z [rank3]:E1204 13:36:48.591000 432963 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.2996192Z [rank3]:E1204 13:36:48.591000 432963 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.2996328Z [rank3]:E1204 13:36:48.591000 432963 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.2996606Z [rank3]:E1204 13:36:48.591000 432963 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.2996754Z [rank3]:E1204 13:36:48.591000 432963 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.2997235Z [rank3]:E1204 13:36:48.591000 432963 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 261632 on device 3. CUDA driver allocated memory was 2250244096 and is now 3852468224.
2025-12-04T13:38:32.2997348Z [rank3]:E1204 13:36:48.591000 432963 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2997541Z [rank3]:E1204 13:36:48.591000 432963 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.2997883Z [rank3]:E1204 13:36:48.591000 432963 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda
2025-12-04T13:38:32.2998004Z [rank3]:E1204 13:36:48.591000 432963 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.2998214Z [rank3]:E1204 13:36:48.591000 432963 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.2998377Z [rank3]:E1204 13:36:48.591000 432963 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.2998414Z dist init r=3, world=4
2025-12-04T13:38:32.2998750Z [rank0]:[W1204 13:36:48.751891137 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.2998790Z FAILED [11.1208s] [ 33%]
2025-12-04T13:38:32.2998793Z 
2025-12-04T13:38:32.2998849Z =================================== FAILURES ===================================
2025-12-04T13:38:32.2998944Z ________ TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda ________
2025-12-04T13:38:32.2998989Z Traceback (most recent call last):
2025-12-04T13:38:32.2999161Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.2999204Z     self._join_processes(fn)
2025-12-04T13:38:32.2999376Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.2999446Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.2999662Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.2999705Z     raise RuntimeError(error)
2025-12-04T13:38:32.2999785Z RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.2999830Z Traceback (most recent call last):
2025-12-04T13:38:32.2999992Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.3000034Z     getattr(self, test_name)()
2025-12-04T13:38:32.3000196Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.3000230Z     fn()
2025-12-04T13:38:32.3000380Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3000421Z     method(*args, **kwargs)
2025-12-04T13:38:32.3000573Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3000612Z     method(*args, **kwargs)
2025-12-04T13:38:32.3000764Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.3000813Z     with policy():
2025-12-04T13:38:32.3000967Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.3001006Z     raise RuntimeError(msg)
2025-12-04T13:38:32.3001349Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 295424 on device 0. CUDA driver allocated memory was 2453667840 and is now 4055891968.
2025-12-04T13:38:32.3001353Z 
2025-12-04T13:38:32.3001426Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.3001642Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda
2025-12-04T13:38:32.3001645Z 
2025-12-04T13:38:32.3001732Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.3001746Z 
2025-12-04T13:38:32.3001806Z Process 1 exited with error code 10 and exception:
2025-12-04T13:38:32.3001849Z Traceback (most recent call last):
2025-12-04T13:38:32.3002012Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.3002054Z     getattr(self, test_name)()
2025-12-04T13:38:32.3002213Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.3002248Z     fn()
2025-12-04T13:38:32.3002398Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3002437Z     method(*args, **kwargs)
2025-12-04T13:38:32.3002590Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3002630Z     method(*args, **kwargs)
2025-12-04T13:38:32.3002782Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.3002818Z     with policy():
2025-12-04T13:38:32.3002982Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.3003023Z     raise RuntimeError(msg)
2025-12-04T13:38:32.3003358Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 1. CUDA driver allocated memory was 2317352960 and is now 3919577088.
2025-12-04T13:38:32.3003373Z 
2025-12-04T13:38:32.3003447Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.3003661Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda
2025-12-04T13:38:32.3003665Z 
2025-12-04T13:38:32.3003751Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.3003753Z 
2025-12-04T13:38:32.3003755Z 
2025-12-04T13:38:32.3003829Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.3003915Z Process 0 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.3004147Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5126715a40a2f0a6.xml -
2025-12-04T13:38:32.3004207Z =========================== short test summary info ============================
2025-12-04T13:38:32.3004443Z FAILED [11.1208s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_none_cuda - RuntimeError: Process 0 exited with error code 10 and exception:
2025-12-04T13:38:32.3004496Z Traceback (most recent call last):
2025-12-04T13:38:32.3004665Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.3004706Z     getattr(self, test_name)()
2025-12-04T13:38:32.3004870Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.3004903Z     fn()
2025-12-04T13:38:32.3005055Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3005095Z     method(*args, **kwargs)
2025-12-04T13:38:32.3005245Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3005283Z     method(*args, **kwargs)
2025-12-04T13:38:32.3005432Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.3005468Z     with policy():
2025-12-04T13:38:32.3005628Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.3005669Z     raise RuntimeError(msg)
2025-12-04T13:38:32.3006009Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 295424 on device 0. CUDA driver allocated memory was 2453667840 and is now 4055891968.
2025-12-04T13:38:32.3006012Z 
2025-12-04T13:38:32.3006085Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.3006298Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda
2025-12-04T13:38:32.3006300Z 
2025-12-04T13:38:32.3006385Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.3006388Z 
2025-12-04T13:38:32.3006445Z Process 1 exited with error code 10 and exception:
2025-12-04T13:38:32.3006490Z Traceback (most recent call last):
2025-12-04T13:38:32.3006651Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.3006701Z     getattr(self, test_name)()
2025-12-04T13:38:32.3006859Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.3006904Z     fn()
2025-12-04T13:38:32.3007054Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3007093Z     method(*args, **kwargs)
2025-12-04T13:38:32.3007242Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3007282Z     method(*args, **kwargs)
2025-12-04T13:38:32.3007433Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.3007468Z     with policy():
2025-12-04T13:38:32.3007620Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.3007659Z     raise RuntimeError(msg)
2025-12-04T13:38:32.3007995Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 1. CUDA driver allocated memory was 2317352960 and is now 3919577088.
2025-12-04T13:38:32.3007998Z 
2025-12-04T13:38:32.3008070Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.3008283Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda
2025-12-04T13:38:32.3008298Z 
2025-12-04T13:38:32.3008383Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.3008446Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.3008508Z ====================== 1 failed, 30 deselected in 11.26s =======================
2025-12-04T13:38:32.3008545Z Got exit code 1
2025-12-04T13:38:32.3008583Z Retrying single test...
2025-12-04T13:38:32.3008775Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-e5bd795f8999e2c0.xml
2025-12-04T13:38:32.3008833Z ============================= test session starts ==============================
2025-12-04T13:38:32.3008947Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.3008987Z cachedir: .pytest_cache
2025-12-04T13:38:32.3009149Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.3009222Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.3009266Z configfile: pytest.ini
2025-12-04T13:38:32.3009428Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.3009501Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.3009744Z stepcurrent: skipping 30 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_none_cuda
2025-12-04T13:38:32.3009787Z Running 1 items in this shard
2025-12-04T13:38:32.3009789Z 
2025-12-04T13:38:32.3010085Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_none_cuda I1204 13:36:53.160000 433293 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 433362
2025-12-04T13:38:32.3010240Z I1204 13:36:53.160000 433293 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 433363
2025-12-04T13:38:32.3010392Z I1204 13:36:53.161000 433293 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 433364
2025-12-04T13:38:32.3010557Z I1204 13:36:53.161000 433293 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 433365
2025-12-04T13:38:32.3010915Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:32.3010975Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:32.3011327Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:32.3011374Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:32.3011726Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:32.3011770Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:32.3012119Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:32.3012163Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:32.3012743Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.3012792Z   _warn_cpu_init()
2025-12-04T13:38:32.3013359Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.3013396Z   _warn_cpu_init()
2025-12-04T13:38:32.3013985Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.3014022Z   _warn_cpu_init()
2025-12-04T13:38:32.3014586Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.3014624Z   _warn_cpu_init()
2025-12-04T13:38:32.3014914Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.3014956Z   return func(*args, **kwargs)
2025-12-04T13:38:32.3015113Z [rank1]:E1204 13:37:02.389000 433363 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.3015276Z [rank1]:E1204 13:37:02.389000 433363 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.3015577Z [rank1]:E1204 13:37:02.389000 433363 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.3015732Z [rank1]:E1204 13:37:02.389000 433363 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.3016022Z [rank1]:E1204 13:37:02.389000 433363 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.3016147Z [rank1]:E1204 13:37:02.389000 433363 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.3016428Z [rank1]:E1204 13:37:02.389000 433363 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3016574Z [rank1]:E1204 13:37:02.389000 433363 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3016852Z [rank1]:E1204 13:37:02.389000 433363 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3017007Z [rank1]:E1204 13:37:02.389000 433363 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3017284Z [rank1]:E1204 13:37:02.389000 433363 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.3017421Z [rank1]:E1204 13:37:02.389000 433363 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.3017695Z [rank1]:E1204 13:37:02.389000 433363 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.3017842Z [rank1]:E1204 13:37:02.389000 433363 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.3018321Z [rank1]:E1204 13:37:02.389000 433363 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 295424 on device 1. CUDA driver allocated memory was 2317352960 and is now 3919577088.
2025-12-04T13:38:32.3018436Z [rank1]:E1204 13:37:02.389000 433363 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3018631Z [rank1]:E1204 13:37:02.389000 433363 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.3018976Z [rank1]:E1204 13:37:02.389000 433363 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda
2025-12-04T13:38:32.3019090Z [rank1]:E1204 13:37:02.389000 433363 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3019309Z [rank1]:E1204 13:37:02.389000 433363 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.3019472Z [rank1]:E1204 13:37:02.389000 433363 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.3019519Z dist init r=1, world=4
2025-12-04T13:38:32.3019695Z [rank3]:E1204 13:37:02.441000 433365 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.3019853Z [rank3]:E1204 13:37:02.441000 433365 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.3020143Z [rank3]:E1204 13:37:02.441000 433365 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.3020295Z [rank3]:E1204 13:37:02.441000 433365 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.3020581Z [rank3]:E1204 13:37:02.441000 433365 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.3020705Z [rank3]:E1204 13:37:02.441000 433365 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.3020981Z [rank3]:E1204 13:37:02.441000 433365 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3021140Z [rank3]:E1204 13:37:02.441000 433365 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3021416Z [rank3]:E1204 13:37:02.441000 433365 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3021564Z [rank3]:E1204 13:37:02.441000 433365 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3021840Z [rank3]:E1204 13:37:02.441000 433365 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.3021974Z [rank3]:E1204 13:37:02.441000 433365 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.3022263Z [rank3]:E1204 13:37:02.441000 433365 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.3022408Z [rank3]:E1204 13:37:02.441000 433365 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.3022874Z [rank3]:E1204 13:37:02.441000 433365 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 261632 on device 3. CUDA driver allocated memory was 2250244096 and is now 3852468224.
2025-12-04T13:38:32.3022987Z [rank3]:E1204 13:37:02.441000 433365 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3023186Z [rank3]:E1204 13:37:02.441000 433365 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.3023527Z [rank3]:E1204 13:37:02.441000 433365 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda
2025-12-04T13:38:32.3023653Z [rank3]:E1204 13:37:02.441000 433365 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3023864Z [rank3]:E1204 13:37:02.441000 433365 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.3024039Z [rank3]:E1204 13:37:02.441000 433365 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.3024077Z dist init r=3, world=4
2025-12-04T13:38:32.3024213Z [rank0]:E1204 13:37:02.455000 433362 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.3024371Z [rank0]:E1204 13:37:02.455000 433362 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.3024656Z [rank0]:E1204 13:37:02.455000 433362 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.3024808Z [rank0]:E1204 13:37:02.455000 433362 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.3025092Z [rank0]:E1204 13:37:02.455000 433362 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.3025215Z [rank0]:E1204 13:37:02.455000 433362 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.3025506Z [rank0]:E1204 13:37:02.455000 433362 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3025652Z [rank0]:E1204 13:37:02.455000 433362 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3025927Z [rank0]:E1204 13:37:02.455000 433362 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3026072Z [rank0]:E1204 13:37:02.455000 433362 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3026366Z [rank0]:E1204 13:37:02.455000 433362 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.3026501Z [rank0]:E1204 13:37:02.455000 433362 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.3026782Z [rank0]:E1204 13:37:02.455000 433362 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.3026927Z [rank0]:E1204 13:37:02.455000 433362 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.3027394Z [rank0]:E1204 13:37:02.455000 433362 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 0. CUDA driver allocated memory was 2453667840 and is now 4055891968.
2025-12-04T13:38:32.3027510Z [rank0]:E1204 13:37:02.455000 433362 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3027706Z [rank0]:E1204 13:37:02.455000 433362 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.3028063Z [rank0]:E1204 13:37:02.455000 433362 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda
2025-12-04T13:38:32.3028187Z [rank0]:E1204 13:37:02.455000 433362 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3028394Z [rank0]:E1204 13:37:02.455000 433362 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.3028559Z [rank0]:E1204 13:37:02.455000 433362 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.3028595Z dist init r=0, world=4
2025-12-04T13:38:32.3028733Z [rank2]:E1204 13:37:02.459000 433364 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.3028891Z [rank2]:E1204 13:37:02.459000 433364 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.3029178Z [rank2]:E1204 13:37:02.459000 433364 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.3029330Z [rank2]:E1204 13:37:02.459000 433364 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.3029650Z [rank2]:E1204 13:37:02.459000 433364 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.3029771Z [rank2]:E1204 13:37:02.459000 433364 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.3030052Z [rank2]:E1204 13:37:02.459000 433364 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3030199Z [rank2]:E1204 13:37:02.459000 433364 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3030473Z [rank2]:E1204 13:37:02.459000 433364 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3030642Z [rank2]:E1204 13:37:02.459000 433364 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3030917Z [rank2]:E1204 13:37:02.459000 433364 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.3031052Z [rank2]:E1204 13:37:02.459000 433364 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.3031328Z [rank2]:E1204 13:37:02.459000 433364 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.3031475Z [rank2]:E1204 13:37:02.459000 433364 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.3031941Z [rank2]:E1204 13:37:02.459000 433364 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 2. CUDA driver allocated memory was 2300575744 and is now 3902799872.
2025-12-04T13:38:32.3032065Z [rank2]:E1204 13:37:02.459000 433364 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3032259Z [rank2]:E1204 13:37:02.459000 433364 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.3032616Z [rank2]:E1204 13:37:02.459000 433364 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda
2025-12-04T13:38:32.3032730Z [rank2]:E1204 13:37:02.459000 433364 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3032939Z [rank2]:E1204 13:37:02.459000 433364 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.3033103Z [rank2]:E1204 13:37:02.459000 433364 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.3033141Z dist init r=2, world=4
2025-12-04T13:38:32.3033473Z [rank0]:[W1204 13:37:02.840032493 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.3033514Z FAILED [11.2221s] [100%]
2025-12-04T13:38:32.3033516Z 
2025-12-04T13:38:32.3033570Z =================================== FAILURES ===================================
2025-12-04T13:38:32.3033682Z ________ TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda ________
2025-12-04T13:38:32.3033726Z Traceback (most recent call last):
2025-12-04T13:38:32.3033891Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.3033932Z     self._join_processes(fn)
2025-12-04T13:38:32.3034106Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.3034159Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.3034336Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.3034378Z     raise RuntimeError(error)
2025-12-04T13:38:32.3034457Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T13:38:32.3034501Z Traceback (most recent call last):
2025-12-04T13:38:32.3034671Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.3034713Z     getattr(self, test_name)()
2025-12-04T13:38:32.3034871Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.3034905Z     fn()
2025-12-04T13:38:32.3035055Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3035096Z     method(*args, **kwargs)
2025-12-04T13:38:32.3035246Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3035285Z     method(*args, **kwargs)
2025-12-04T13:38:32.3035434Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.3035472Z     with policy():
2025-12-04T13:38:32.3035625Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.3035665Z     raise RuntimeError(msg)
2025-12-04T13:38:32.3036019Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 295424 on device 1. CUDA driver allocated memory was 2317352960 and is now 3919577088.
2025-12-04T13:38:32.3036021Z 
2025-12-04T13:38:32.3036108Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.3036323Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda
2025-12-04T13:38:32.3036325Z 
2025-12-04T13:38:32.3036411Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.3036414Z 
2025-12-04T13:38:32.3036416Z 
2025-12-04T13:38:32.3036492Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.3036578Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.3036813Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-e5bd795f8999e2c0.xml -
2025-12-04T13:38:32.3036873Z =========================== short test summary info ============================
2025-12-04T13:38:32.3037108Z FAILED [11.2221s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_none_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T13:38:32.3037153Z Traceback (most recent call last):
2025-12-04T13:38:32.3037317Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.3037368Z     getattr(self, test_name)()
2025-12-04T13:38:32.3037529Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.3037562Z     fn()
2025-12-04T13:38:32.3037714Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3037754Z     method(*args, **kwargs)
2025-12-04T13:38:32.3037906Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3037945Z     method(*args, **kwargs)
2025-12-04T13:38:32.3038094Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.3038129Z     with policy():
2025-12-04T13:38:32.3038281Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.3038321Z     raise RuntimeError(msg)
2025-12-04T13:38:32.3038672Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 295424 on device 1. CUDA driver allocated memory was 2317352960 and is now 3919577088.
2025-12-04T13:38:32.3038675Z 
2025-12-04T13:38:32.3038749Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.3038963Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda
2025-12-04T13:38:32.3038967Z 
2025-12-04T13:38:32.3039052Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.3039114Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.3039175Z ====================== 1 failed, 32 deselected in 11.38s =======================
2025-12-04T13:38:32.3039211Z Got exit code 1
2025-12-04T13:38:32.3039251Z Retrying single test...
2025-12-04T13:38:32.3039440Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-761ce9b29c6b03b5.xml
2025-12-04T13:38:32.3039498Z ============================= test session starts ==============================
2025-12-04T13:38:32.3039650Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.3039691Z cachedir: .pytest_cache
2025-12-04T13:38:32.3039863Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.3039909Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.3039948Z configfile: pytest.ini
2025-12-04T13:38:32.3040110Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.3040184Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.3040398Z stepcurrent: skipping 30 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_none_cuda
2025-12-04T13:38:32.3040440Z Running 1 items in this shard
2025-12-04T13:38:32.3040442Z 
2025-12-04T13:38:32.3040735Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_none_cuda I1204 13:37:07.182000 433695 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 433764
2025-12-04T13:38:32.3040889Z I1204 13:37:07.183000 433695 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 433765
2025-12-04T13:38:32.3041039Z I1204 13:37:07.183000 433695 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 433766
2025-12-04T13:38:32.3041189Z I1204 13:37:07.184000 433695 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 433767
2025-12-04T13:38:32.3041560Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:32.3041609Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:32.3041960Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:32.3042007Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:32.3042358Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:32.3042414Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:32.3042767Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:32.3042810Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:32.3043389Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.3043426Z   _warn_cpu_init()
2025-12-04T13:38:32.3044016Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.3044054Z   _warn_cpu_init()
2025-12-04T13:38:32.3044628Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.3044667Z   _warn_cpu_init()
2025-12-04T13:38:32.3045236Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.3045273Z   _warn_cpu_init()
2025-12-04T13:38:32.3045564Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.3045605Z   return func(*args, **kwargs)
2025-12-04T13:38:32.3045765Z [rank3]:E1204 13:37:16.386000 433767 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.3045927Z [rank3]:E1204 13:37:16.386000 433767 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.3046219Z [rank3]:E1204 13:37:16.386000 433767 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.3046374Z [rank3]:E1204 13:37:16.386000 433767 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.3046659Z [rank3]:E1204 13:37:16.386000 433767 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.3046785Z [rank3]:E1204 13:37:16.386000 433767 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.3047071Z [rank3]:E1204 13:37:16.386000 433767 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3047220Z [rank3]:E1204 13:37:16.386000 433767 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3047495Z [rank3]:E1204 13:37:16.386000 433767 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3047642Z [rank3]:E1204 13:37:16.386000 433767 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3047916Z [rank3]:E1204 13:37:16.386000 433767 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.3048053Z [rank3]:E1204 13:37:16.386000 433767 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.3048344Z [rank3]:E1204 13:37:16.386000 433767 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.3048500Z [rank3]:E1204 13:37:16.386000 433767 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.3048972Z [rank3]:E1204 13:37:16.386000 433767 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 3. CUDA driver allocated memory was 2250244096 and is now 3852468224.
2025-12-04T13:38:32.3049086Z [rank3]:E1204 13:37:16.386000 433767 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3049284Z [rank3]:E1204 13:37:16.386000 433767 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.3049655Z [rank3]:E1204 13:37:16.386000 433767 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda
2025-12-04T13:38:32.3049769Z [rank3]:E1204 13:37:16.386000 433767 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3049980Z [rank3]:E1204 13:37:16.386000 433767 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.3050156Z [rank3]:E1204 13:37:16.386000 433767 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.3050194Z dist init r=3, world=4
2025-12-04T13:38:32.3050331Z [rank1]:E1204 13:37:16.399000 433765 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.3050490Z [rank1]:E1204 13:37:16.399000 433765 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.3050777Z [rank1]:E1204 13:37:16.399000 433765 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.3050930Z [rank1]:E1204 13:37:16.399000 433765 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.3051226Z [rank1]:E1204 13:37:16.399000 433765 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.3051349Z [rank1]:E1204 13:37:16.399000 433765 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.3051625Z [rank1]:E1204 13:37:16.399000 433765 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3051772Z [rank1]:E1204 13:37:16.399000 433765 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3052047Z [rank1]:E1204 13:37:16.399000 433765 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3052194Z [rank1]:E1204 13:37:16.399000 433765 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3052493Z [rank1]:E1204 13:37:16.399000 433765 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.3052628Z [rank1]:E1204 13:37:16.399000 433765 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.3052922Z [rank1]:E1204 13:37:16.399000 433765 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.3053071Z [rank1]:E1204 13:37:16.399000 433765 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.3053542Z [rank1]:E1204 13:37:16.399000 433765 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 295424 on device 1. CUDA driver allocated memory was 2317352960 and is now 3919577088.
2025-12-04T13:38:32.3053656Z [rank1]:E1204 13:37:16.399000 433765 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3053848Z [rank1]:E1204 13:37:16.399000 433765 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.3054191Z [rank1]:E1204 13:37:16.399000 433765 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda
2025-12-04T13:38:32.3054313Z [rank1]:E1204 13:37:16.399000 433765 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3054525Z [rank1]:E1204 13:37:16.399000 433765 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.3054689Z [rank1]:E1204 13:37:16.399000 433765 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.3054726Z dist init r=1, world=4
2025-12-04T13:38:32.3054863Z [rank0]:E1204 13:37:16.407000 433764 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.3055022Z [rank0]:E1204 13:37:16.407000 433764 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.3055319Z [rank0]:E1204 13:37:16.407000 433764 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.3055472Z [rank0]:E1204 13:37:16.407000 433764 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.3055757Z [rank0]:E1204 13:37:16.407000 433764 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.3055878Z [rank0]:E1204 13:37:16.407000 433764 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.3056153Z [rank0]:E1204 13:37:16.407000 433764 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3056299Z [rank0]:E1204 13:37:16.407000 433764 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3056576Z [rank0]:E1204 13:37:16.407000 433764 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3056731Z [rank0]:E1204 13:37:16.407000 433764 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3057005Z [rank0]:E1204 13:37:16.407000 433764 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.3057149Z [rank0]:E1204 13:37:16.407000 433764 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.3057424Z [rank0]:E1204 13:37:16.407000 433764 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.3057572Z [rank0]:E1204 13:37:16.407000 433764 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.3058037Z [rank0]:E1204 13:37:16.407000 433764 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 0. CUDA driver allocated memory was 2453667840 and is now 4055891968.
2025-12-04T13:38:32.3058151Z [rank0]:E1204 13:37:16.407000 433764 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3058346Z [rank0]:E1204 13:37:16.407000 433764 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.3058698Z [rank0]:E1204 13:37:16.407000 433764 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda
2025-12-04T13:38:32.3058810Z [rank0]:E1204 13:37:16.407000 433764 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3059021Z [rank0]:E1204 13:37:16.407000 433764 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.3059185Z [rank0]:E1204 13:37:16.407000 433764 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.3059222Z dist init r=0, world=4
2025-12-04T13:38:32.3059358Z [rank2]:E1204 13:37:16.443000 433766 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.3059528Z [rank2]:E1204 13:37:16.443000 433766 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.3059851Z [rank2]:E1204 13:37:16.443000 433766 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.3060004Z [rank2]:E1204 13:37:16.443000 433766 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.3060287Z [rank2]:E1204 13:37:16.443000 433766 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.3060409Z [rank2]:E1204 13:37:16.443000 433766 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.3060687Z [rank2]:E1204 13:37:16.443000 433766 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3060833Z [rank2]:E1204 13:37:16.443000 433766 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3061121Z [rank2]:E1204 13:37:16.443000 433766 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3061285Z [rank2]:E1204 13:37:16.443000 433766 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3061559Z [rank2]:E1204 13:37:16.443000 433766 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.3061696Z [rank2]:E1204 13:37:16.443000 433766 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.3061976Z [rank2]:E1204 13:37:16.443000 433766 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.3062122Z [rank2]:E1204 13:37:16.443000 433766 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.3062588Z [rank2]:E1204 13:37:16.443000 433766 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 295424 on device 2. CUDA driver allocated memory was 2300575744 and is now 3902799872.
2025-12-04T13:38:32.3062714Z [rank2]:E1204 13:37:16.443000 433766 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3062907Z [rank2]:E1204 13:37:16.443000 433766 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.3063249Z [rank2]:E1204 13:37:16.443000 433766 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda
2025-12-04T13:38:32.3063361Z [rank2]:E1204 13:37:16.443000 433766 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3063572Z [rank2]:E1204 13:37:16.443000 433766 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.3063735Z [rank2]:E1204 13:37:16.443000 433766 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.3063786Z dist init r=2, world=4
2025-12-04T13:38:32.3064124Z [rank0]:[W1204 13:37:16.662245851 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.3064162Z FAILED [11.1222s] [100%]
2025-12-04T13:38:32.3064164Z 
2025-12-04T13:38:32.3064220Z =================================== FAILURES ===================================
2025-12-04T13:38:32.3064315Z ________ TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda ________
2025-12-04T13:38:32.3064360Z Traceback (most recent call last):
2025-12-04T13:38:32.3064522Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.3064566Z     self._join_processes(fn)
2025-12-04T13:38:32.3064739Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.3064792Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.3064979Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.3065022Z     raise RuntimeError(error)
2025-12-04T13:38:32.3065099Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.3065156Z Traceback (most recent call last):
2025-12-04T13:38:32.3065315Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.3065357Z     getattr(self, test_name)()
2025-12-04T13:38:32.3065513Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.3065548Z     fn()
2025-12-04T13:38:32.3065699Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3065739Z     method(*args, **kwargs)
2025-12-04T13:38:32.3065889Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3065929Z     method(*args, **kwargs)
2025-12-04T13:38:32.3066080Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.3066117Z     with policy():
2025-12-04T13:38:32.3066271Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.3066310Z     raise RuntimeError(msg)
2025-12-04T13:38:32.3066656Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 3. CUDA driver allocated memory was 2250244096 and is now 3852468224.
2025-12-04T13:38:32.3066668Z 
2025-12-04T13:38:32.3066742Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.3066957Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda
2025-12-04T13:38:32.3066959Z 
2025-12-04T13:38:32.3067044Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.3067047Z 
2025-12-04T13:38:32.3067049Z 
2025-12-04T13:38:32.3067122Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.3067208Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.3067443Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-761ce9b29c6b03b5.xml -
2025-12-04T13:38:32.3067514Z =========================== short test summary info ============================
2025-12-04T13:38:32.3067748Z FAILED [11.1222s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_none_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.3067794Z Traceback (most recent call last):
2025-12-04T13:38:32.3067957Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.3068000Z     getattr(self, test_name)()
2025-12-04T13:38:32.3068160Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.3068194Z     fn()
2025-12-04T13:38:32.3068345Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3068385Z     method(*args, **kwargs)
2025-12-04T13:38:32.3068537Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3068576Z     method(*args, **kwargs)
2025-12-04T13:38:32.3068727Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.3068775Z     with policy():
2025-12-04T13:38:32.3068928Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.3068980Z     raise RuntimeError(msg)
2025-12-04T13:38:32.3069325Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 3. CUDA driver allocated memory was 2250244096 and is now 3852468224.
2025-12-04T13:38:32.3069330Z 
2025-12-04T13:38:32.3069402Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.3069654Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_none_cuda
2025-12-04T13:38:32.3069656Z 
2025-12-04T13:38:32.3069741Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.3069804Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.3069865Z ====================== 1 failed, 32 deselected in 11.27s =======================
2025-12-04T13:38:32.3069902Z Got exit code 1
2025-12-04T13:38:32.3070067Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_none_cuda
2025-12-04T13:38:32.3070195Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T13:38:32.3070395Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-96f31bdbacda79ff.xml
2025-12-04T13:38:32.3070452Z ============================= test session starts ==============================
2025-12-04T13:38:32.3070563Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.3070603Z cachedir: .pytest_cache
2025-12-04T13:38:32.3070760Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.3070806Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.3070845Z configfile: pytest.ini
2025-12-04T13:38:32.3071008Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.3071079Z collecting ... collected 60 items / 31 deselected / 29 selected
2025-12-04T13:38:32.3071132Z stepcurrent: skipping 31 already run items.
2025-12-04T13:38:32.3071174Z Running 2 items in this shard
2025-12-04T13:38:32.3071177Z 
2025-12-04T13:38:32.3071498Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_shard_grad_op_cuda I1204 13:37:21.064000 434097 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 434166
2025-12-04T13:38:32.3071655Z I1204 13:37:21.065000 434097 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 434167
2025-12-04T13:38:32.3071806Z I1204 13:37:21.065000 434097 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 434168
2025-12-04T13:38:32.3071957Z I1204 13:37:21.066000 434097 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 434169
2025-12-04T13:38:32.3072316Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:32.3072365Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:32.3072729Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:32.3072775Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:32.3073125Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:32.3073181Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:32.3073533Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:32.3073576Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:32.3074161Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.3074199Z   _warn_cpu_init()
2025-12-04T13:38:32.3074782Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.3074834Z   _warn_cpu_init()
2025-12-04T13:38:32.3075401Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.3075438Z   _warn_cpu_init()
2025-12-04T13:38:32.3075738Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.3075782Z   return func(*args, **kwargs)
2025-12-04T13:38:32.3076353Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.3076391Z   _warn_cpu_init()
2025-12-04T13:38:32.3076533Z [rank3]:E1204 13:37:30.653000 434169 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.3076693Z [rank3]:E1204 13:37:30.653000 434169 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.3076984Z [rank3]:E1204 13:37:30.653000 434169 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.3077153Z [rank3]:E1204 13:37:30.653000 434169 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.3077437Z [rank3]:E1204 13:37:30.653000 434169 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.3077571Z [rank3]:E1204 13:37:30.653000 434169 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.3077847Z [rank3]:E1204 13:37:30.653000 434169 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3077997Z [rank3]:E1204 13:37:30.653000 434169 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3078274Z [rank3]:E1204 13:37:30.653000 434169 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3078422Z [rank3]:E1204 13:37:30.653000 434169 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3078696Z [rank3]:E1204 13:37:30.653000 434169 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.3078832Z [rank3]:E1204 13:37:30.653000 434169 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.3079119Z [rank3]:E1204 13:37:30.653000 434169 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.3079267Z [rank3]:E1204 13:37:30.653000 434169 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.3079771Z [rank3]:E1204 13:37:30.653000 434169 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 2250244096 and is now 3829399552.
2025-12-04T13:38:32.3079887Z [rank3]:E1204 13:37:30.653000 434169 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3080098Z [rank3]:E1204 13:37:30.653000 434169 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.3080459Z [rank3]:E1204 13:37:30.653000 434169 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.3080572Z [rank3]:E1204 13:37:30.653000 434169 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3080783Z [rank3]:E1204 13:37:30.653000 434169 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.3080946Z [rank3]:E1204 13:37:30.653000 434169 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.3080985Z dist init r=3, world=4
2025-12-04T13:38:32.3081123Z [rank2]:E1204 13:37:30.654000 434168 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.3081283Z [rank2]:E1204 13:37:30.654000 434168 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.3081583Z [rank2]:E1204 13:37:30.654000 434168 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.3081747Z [rank2]:E1204 13:37:30.654000 434168 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.3082030Z [rank2]:E1204 13:37:30.654000 434168 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.3082155Z [rank2]:E1204 13:37:30.654000 434168 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.3082430Z [rank2]:E1204 13:37:30.654000 434168 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3082577Z [rank2]:E1204 13:37:30.654000 434168 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3082856Z [rank2]:E1204 13:37:30.654000 434168 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3083003Z [rank2]:E1204 13:37:30.654000 434168 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3083292Z [rank2]:E1204 13:37:30.654000 434168 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.3083427Z [rank2]:E1204 13:37:30.654000 434168 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.3083704Z [rank2]:E1204 13:37:30.654000 434168 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.3083851Z [rank2]:E1204 13:37:30.654000 434168 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.3084333Z [rank2]:E1204 13:37:30.654000 434168 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 2. CUDA driver allocated memory was 2300575744 and is now 3879731200.
2025-12-04T13:38:32.3084449Z [rank2]:E1204 13:37:30.654000 434168 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3084642Z [rank2]:E1204 13:37:30.654000 434168 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.3084998Z [rank2]:E1204 13:37:30.654000 434168 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.3085111Z [rank2]:E1204 13:37:30.654000 434168 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3085323Z [rank2]:E1204 13:37:30.654000 434168 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.3085486Z [rank2]:E1204 13:37:30.654000 434168 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.3085523Z dist init r=2, world=4
2025-12-04T13:38:32.3085670Z [rank0]:E1204 13:37:30.701000 434166 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.3085828Z [rank0]:E1204 13:37:30.701000 434166 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.3086122Z [rank0]:E1204 13:37:30.701000 434166 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.3086274Z [rank0]:E1204 13:37:30.701000 434166 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.3086559Z [rank0]:E1204 13:37:30.701000 434166 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.3086681Z [rank0]:E1204 13:37:30.701000 434166 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.3086959Z [rank0]:E1204 13:37:30.701000 434166 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3087105Z [rank0]:E1204 13:37:30.701000 434166 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3087383Z [rank0]:E1204 13:37:30.701000 434166 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3087540Z [rank0]:E1204 13:37:30.701000 434166 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3087815Z [rank0]:E1204 13:37:30.701000 434166 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.3087951Z [rank0]:E1204 13:37:30.701000 434166 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.3088226Z [rank0]:E1204 13:37:30.701000 434166 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.3088373Z [rank0]:E1204 13:37:30.701000 434166 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.3088858Z [rank0]:E1204 13:37:30.701000 434166 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 2453667840 and is now 4032823296.
2025-12-04T13:38:32.3088971Z [rank0]:E1204 13:37:30.701000 434166 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3089166Z [rank0]:E1204 13:37:30.701000 434166 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.3089517Z [rank0]:E1204 13:37:30.701000 434166 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.3089665Z [rank0]:E1204 13:37:30.701000 434166 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3089888Z [rank0]:E1204 13:37:30.701000 434166 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.3090052Z [rank0]:E1204 13:37:30.701000 434166 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.3090102Z dist init r=0, world=4
2025-12-04T13:38:32.3090238Z [rank1]:E1204 13:37:30.702000 434167 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.3090396Z [rank1]:E1204 13:37:30.702000 434167 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.3090682Z [rank1]:E1204 13:37:30.702000 434167 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.3090833Z [rank1]:E1204 13:37:30.702000 434167 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.3091116Z [rank1]:E1204 13:37:30.702000 434167 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.3091240Z [rank1]:E1204 13:37:30.702000 434167 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.3091514Z [rank1]:E1204 13:37:30.702000 434167 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3091676Z [rank1]:E1204 13:37:30.702000 434167 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3091951Z [rank1]:E1204 13:37:30.702000 434167 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3092099Z [rank1]:E1204 13:37:30.702000 434167 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3092375Z [rank1]:E1204 13:37:30.702000 434167 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.3092511Z [rank1]:E1204 13:37:30.702000 434167 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.3092807Z [rank1]:E1204 13:37:30.702000 434167 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.3092953Z [rank1]:E1204 13:37:30.702000 434167 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.3093429Z [rank1]:E1204 13:37:30.702000 434167 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 2317352960 and is now 3896508416.
2025-12-04T13:38:32.3093543Z [rank1]:E1204 13:37:30.702000 434167 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3093736Z [rank1]:E1204 13:37:30.702000 434167 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.3094093Z [rank1]:E1204 13:37:30.702000 434167 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.3094215Z [rank1]:E1204 13:37:30.702000 434167 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3094425Z [rank1]:E1204 13:37:30.702000 434167 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.3094599Z [rank1]:E1204 13:37:30.702000 434167 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.3094637Z dist init r=1, world=4
2025-12-04T13:38:32.3094973Z [rank0]:[W1204 13:37:31.081157773 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.3095013Z FAILED [11.5241s] [ 50%]
2025-12-04T13:38:32.3095015Z 
2025-12-04T13:38:32.3095072Z =================================== FAILURES ===================================
2025-12-04T13:38:32.3095170Z ____ TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda ____
2025-12-04T13:38:32.3095216Z Traceback (most recent call last):
2025-12-04T13:38:32.3095380Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.3095423Z     self._join_processes(fn)
2025-12-04T13:38:32.3095594Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.3095658Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.3095835Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.3095878Z     raise RuntimeError(error)
2025-12-04T13:38:32.3095956Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.3096001Z Traceback (most recent call last):
2025-12-04T13:38:32.3096161Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.3096203Z     getattr(self, test_name)()
2025-12-04T13:38:32.3096362Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.3096396Z     fn()
2025-12-04T13:38:32.3096548Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3096589Z     method(*args, **kwargs)
2025-12-04T13:38:32.3096749Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3096790Z     method(*args, **kwargs)
2025-12-04T13:38:32.3096940Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.3096977Z     with policy():
2025-12-04T13:38:32.3097129Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.3097170Z     raise RuntimeError(msg)
2025-12-04T13:38:32.3097520Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 2250244096 and is now 3829399552.
2025-12-04T13:38:32.3097523Z 
2025-12-04T13:38:32.3097597Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.3097826Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.3097828Z 
2025-12-04T13:38:32.3097929Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.3097931Z 
2025-12-04T13:38:32.3097933Z 
2025-12-04T13:38:32.3098008Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.3098105Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.3098338Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-96f31bdbacda79ff.xml -
2025-12-04T13:38:32.3098397Z =========================== short test summary info ============================
2025-12-04T13:38:32.3098644Z FAILED [11.5241s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_shard_grad_op_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.3098690Z Traceback (most recent call last):
2025-12-04T13:38:32.3098855Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.3098898Z     getattr(self, test_name)()
2025-12-04T13:38:32.3099058Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.3099093Z     fn()
2025-12-04T13:38:32.3099243Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3099283Z     method(*args, **kwargs)
2025-12-04T13:38:32.3099433Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3099482Z     method(*args, **kwargs)
2025-12-04T13:38:32.3099666Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.3099703Z     with policy():
2025-12-04T13:38:32.3099854Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.3099894Z     raise RuntimeError(msg)
2025-12-04T13:38:32.3100245Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 2250244096 and is now 3829399552.
2025-12-04T13:38:32.3100248Z 
2025-12-04T13:38:32.3100323Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.3100563Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.3100567Z 
2025-12-04T13:38:32.3100654Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.3100716Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.3100777Z ====================== 1 failed, 31 deselected in 11.69s =======================
2025-12-04T13:38:32.3100814Z Got exit code 1
2025-12-04T13:38:32.3100853Z Retrying single test...
2025-12-04T13:38:32.3101046Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-4c42c4f7d061028c.xml
2025-12-04T13:38:32.3101102Z ============================= test session starts ==============================
2025-12-04T13:38:32.3101214Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.3101255Z cachedir: .pytest_cache
2025-12-04T13:38:32.3101413Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.3101458Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.3101497Z configfile: pytest.ini
2025-12-04T13:38:32.3101674Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.3101748Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.3101967Z stepcurrent: skipping 31 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.3102024Z Running 1 items in this shard
2025-12-04T13:38:32.3102026Z 
2025-12-04T13:38:32.3102331Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_shard_grad_op_cuda I1204 13:37:35.304000 434499 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 434568
2025-12-04T13:38:32.3102488Z I1204 13:37:35.305000 434499 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 434569
2025-12-04T13:38:32.3102639Z I1204 13:37:35.305000 434499 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 434570
2025-12-04T13:38:32.3102790Z I1204 13:37:35.306000 434499 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 434571
2025-12-04T13:38:32.3103155Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:32.3103202Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:32.3103557Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:32.3103616Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:32.3103971Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:32.3104016Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:32.3104365Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:32.3104409Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:32.3104996Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.3105034Z   _warn_cpu_init()
2025-12-04T13:38:32.3105609Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.3105648Z   _warn_cpu_init()
2025-12-04T13:38:32.3106227Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.3108672Z   _warn_cpu_init()
2025-12-04T13:38:32.3108969Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.3109013Z   return func(*args, **kwargs)
2025-12-04T13:38:32.3109620Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.3109659Z   _warn_cpu_init()
2025-12-04T13:38:32.3109803Z [rank3]:E1204 13:37:45.090000 434571 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.3109967Z [rank3]:E1204 13:37:45.090000 434571 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.3110264Z [rank3]:E1204 13:37:45.090000 434571 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.3110452Z [rank3]:E1204 13:37:45.090000 434571 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.3110737Z [rank3]:E1204 13:37:45.090000 434571 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.3110861Z [rank3]:E1204 13:37:45.090000 434571 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.3111140Z [rank3]:E1204 13:37:45.090000 434571 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3111289Z [rank3]:E1204 13:37:45.090000 434571 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3111585Z [rank3]:E1204 13:37:45.090000 434571 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3111732Z [rank3]:E1204 13:37:45.090000 434571 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3112006Z [rank3]:E1204 13:37:45.090000 434571 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.3112143Z [rank3]:E1204 13:37:45.090000 434571 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.3112420Z [rank3]:E1204 13:37:45.090000 434571 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.3112570Z [rank3]:E1204 13:37:45.090000 434571 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.3113067Z [rank3]:E1204 13:37:45.090000 434571 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 2250244096 and is now 3829399552.
2025-12-04T13:38:32.3113197Z [rank3]:E1204 13:37:45.090000 434571 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3113391Z [rank3]:E1204 13:37:45.090000 434571 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.3113746Z [rank3]:E1204 13:37:45.090000 434571 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.3113861Z [rank3]:E1204 13:37:45.090000 434571 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3114073Z [rank3]:E1204 13:37:45.090000 434571 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.3114238Z [rank3]:E1204 13:37:45.090000 434571 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.3114276Z dist init r=3, world=4
2025-12-04T13:38:32.3114413Z [rank2]:E1204 13:37:45.120000 434570 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.3114572Z [rank2]:E1204 13:37:45.120000 434570 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.3114872Z [rank2]:E1204 13:37:45.120000 434570 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.3115027Z [rank2]:E1204 13:37:45.120000 434570 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.3115309Z [rank2]:E1204 13:37:45.120000 434570 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.3115435Z [rank2]:E1204 13:37:45.120000 434570 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.3115720Z [rank2]:E1204 13:37:45.120000 434570 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3115868Z [rank2]:E1204 13:37:45.120000 434570 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3116142Z [rank2]:E1204 13:37:45.120000 434570 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3116289Z [rank2]:E1204 13:37:45.120000 434570 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3116566Z [rank2]:E1204 13:37:45.120000 434570 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.3116701Z [rank2]:E1204 13:37:45.120000 434570 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.3116984Z [rank2]:E1204 13:37:45.120000 434570 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.3117141Z [rank2]:E1204 13:37:45.120000 434570 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.3117616Z [rank2]:E1204 13:37:45.120000 434570 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 2300575744 and is now 3879731200.
2025-12-04T13:38:32.3117739Z [rank2]:E1204 13:37:45.120000 434570 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3117935Z [rank2]:E1204 13:37:45.120000 434570 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.3118292Z [rank2]:E1204 13:37:45.120000 434570 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.3118403Z [rank2]:E1204 13:37:45.120000 434570 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3118615Z [rank2]:E1204 13:37:45.120000 434570 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.3118777Z [rank2]:E1204 13:37:45.120000 434570 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.3118824Z dist init r=2, world=4
2025-12-04T13:38:32.3118960Z [rank0]:E1204 13:37:45.129000 434568 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.3119119Z [rank0]:E1204 13:37:45.129000 434568 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.3119405Z [rank0]:E1204 13:37:45.129000 434568 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.3119559Z [rank0]:E1204 13:37:45.129000 434568 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.3119875Z [rank0]:E1204 13:37:45.129000 434568 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.3120014Z [rank0]:E1204 13:37:45.129000 434568 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.3120292Z [rank0]:E1204 13:37:45.129000 434568 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3120438Z [rank0]:E1204 13:37:45.129000 434568 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3120717Z [rank0]:E1204 13:37:45.129000 434568 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3120861Z [rank0]:E1204 13:37:45.129000 434568 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3121139Z [rank0]:E1204 13:37:45.129000 434568 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.3121273Z [rank0]:E1204 13:37:45.129000 434568 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.3121564Z [rank0]:E1204 13:37:45.129000 434568 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.3121727Z [rank0]:E1204 13:37:45.129000 434568 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.3122204Z [rank0]:E1204 13:37:45.129000 434568 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 0. CUDA driver allocated memory was 2453667840 and is now 4032823296.
2025-12-04T13:38:32.3122318Z [rank0]:E1204 13:37:45.129000 434568 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3122512Z [rank0]:E1204 13:37:45.129000 434568 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.3122867Z [rank0]:E1204 13:37:45.129000 434568 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.3122980Z [rank0]:E1204 13:37:45.129000 434568 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3123202Z [rank0]:E1204 13:37:45.129000 434568 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.3123365Z [rank0]:E1204 13:37:45.129000 434568 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.3123403Z dist init r=0, world=4
2025-12-04T13:38:32.3123540Z [rank1]:E1204 13:37:45.134000 434569 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.3123697Z [rank1]:E1204 13:37:45.134000 434569 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.3123986Z [rank1]:E1204 13:37:45.134000 434569 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.3124139Z [rank1]:E1204 13:37:45.134000 434569 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.3124432Z [rank1]:E1204 13:37:45.134000 434569 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.3124557Z [rank1]:E1204 13:37:45.134000 434569 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.3124829Z [rank1]:E1204 13:37:45.134000 434569 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3124976Z [rank1]:E1204 13:37:45.134000 434569 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3125250Z [rank1]:E1204 13:37:45.134000 434569 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3125398Z [rank1]:E1204 13:37:45.134000 434569 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3125682Z [rank1]:E1204 13:37:45.134000 434569 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.3125819Z [rank1]:E1204 13:37:45.134000 434569 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.3126111Z [rank1]:E1204 13:37:45.134000 434569 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.3126258Z [rank1]:E1204 13:37:45.134000 434569 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.3126735Z [rank1]:E1204 13:37:45.134000 434569 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 2317352960 and is now 3896508416.
2025-12-04T13:38:32.3126848Z [rank1]:E1204 13:37:45.134000 434569 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3127043Z [rank1]:E1204 13:37:45.134000 434569 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.3127396Z [rank1]:E1204 13:37:45.134000 434569 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.3127517Z [rank1]:E1204 13:37:45.134000 434569 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3127728Z [rank1]:E1204 13:37:45.134000 434569 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.3127891Z [rank1]:E1204 13:37:45.134000 434569 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.3127930Z dist init r=1, world=4
2025-12-04T13:38:32.3128265Z [rank0]:[W1204 13:37:45.413622202 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.3128306Z FAILED [11.8208s] [100%]
2025-12-04T13:38:32.3128308Z 
2025-12-04T13:38:32.3128373Z =================================== FAILURES ===================================
2025-12-04T13:38:32.3128473Z ____ TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda ____
2025-12-04T13:38:32.3128518Z Traceback (most recent call last):
2025-12-04T13:38:32.3128682Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.3128724Z     self._join_processes(fn)
2025-12-04T13:38:32.3128897Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.3128951Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.3129129Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.3129174Z     raise RuntimeError(error)
2025-12-04T13:38:32.3129251Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.3129297Z Traceback (most recent call last):
2025-12-04T13:38:32.3129458Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.3129499Z     getattr(self, test_name)()
2025-12-04T13:38:32.3129709Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.3129744Z     fn()
2025-12-04T13:38:32.3129910Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3129951Z     method(*args, **kwargs)
2025-12-04T13:38:32.3130101Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3130140Z     method(*args, **kwargs)
2025-12-04T13:38:32.3130293Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.3130329Z     with policy():
2025-12-04T13:38:32.3130570Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.3130612Z     raise RuntimeError(msg)
2025-12-04T13:38:32.3130960Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 2250244096 and is now 3829399552.
2025-12-04T13:38:32.3130963Z 
2025-12-04T13:38:32.3131039Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.3131267Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.3131283Z 
2025-12-04T13:38:32.3131371Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.3131373Z 
2025-12-04T13:38:32.3131375Z 
2025-12-04T13:38:32.3131451Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.3131537Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.3131768Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-4c42c4f7d061028c.xml -
2025-12-04T13:38:32.3131828Z =========================== short test summary info ============================
2025-12-04T13:38:32.3132072Z FAILED [11.8208s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_shard_grad_op_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.3132116Z Traceback (most recent call last):
2025-12-04T13:38:32.3132295Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.3132337Z     getattr(self, test_name)()
2025-12-04T13:38:32.3132497Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.3132530Z     fn()
2025-12-04T13:38:32.3132681Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3132720Z     method(*args, **kwargs)
2025-12-04T13:38:32.3132872Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3132911Z     method(*args, **kwargs)
2025-12-04T13:38:32.3133060Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.3133097Z     with policy():
2025-12-04T13:38:32.3133249Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.3133290Z     raise RuntimeError(msg)
2025-12-04T13:38:32.3133651Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 2250244096 and is now 3829399552.
2025-12-04T13:38:32.3133654Z 
2025-12-04T13:38:32.3133739Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.3133968Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.3133971Z 
2025-12-04T13:38:32.3134056Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.3134118Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.3134182Z ====================== 1 failed, 32 deselected in 11.98s =======================
2025-12-04T13:38:32.3134218Z Got exit code 1
2025-12-04T13:38:32.3134257Z Retrying single test...
2025-12-04T13:38:32.3134446Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0af34500f77d76fa.xml
2025-12-04T13:38:32.3134505Z ============================= test session starts ==============================
2025-12-04T13:38:32.3134621Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.3134661Z cachedir: .pytest_cache
2025-12-04T13:38:32.3134820Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.3134865Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.3134914Z configfile: pytest.ini
2025-12-04T13:38:32.3135082Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.3135157Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.3135377Z stepcurrent: skipping 31 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.3135419Z Running 1 items in this shard
2025-12-04T13:38:32.3135422Z 
2025-12-04T13:38:32.3135725Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_shard_grad_op_cuda I1204 13:37:49.642000 434901 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 434970
2025-12-04T13:38:32.3135882Z I1204 13:37:49.643000 434901 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 434971
2025-12-04T13:38:32.3136045Z I1204 13:37:49.643000 434901 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 434972
2025-12-04T13:38:32.3136196Z I1204 13:37:49.644000 434901 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 434973
2025-12-04T13:38:32.3136558Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:32.3136606Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:32.3136958Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:32.3137003Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:32.3137356Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:32.3137399Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:32.3137760Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:32.3137815Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:32.3138390Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.3138428Z   _warn_cpu_init()
2025-12-04T13:38:32.3138719Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
2025-12-04T13:38:32.3138761Z   return func(*args, **kwargs)
2025-12-04T13:38:32.3139336Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.3139383Z   _warn_cpu_init()
2025-12-04T13:38:32.3139992Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.3140028Z   _warn_cpu_init()
2025-12-04T13:38:32.3140605Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication.
2025-12-04T13:38:32.3140641Z   _warn_cpu_init()
2025-12-04T13:38:32.3140784Z [rank1]:E1204 13:37:59.206000 434971 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.3140944Z [rank1]:E1204 13:37:59.206000 434971 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.3141233Z [rank1]:E1204 13:37:59.206000 434971 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.3141388Z [rank1]:E1204 13:37:59.206000 434971 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.3141672Z [rank1]:E1204 13:37:59.206000 434971 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.3141797Z [rank1]:E1204 13:37:59.206000 434971 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.3142090Z [rank1]:E1204 13:37:59.206000 434971 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3142240Z [rank1]:E1204 13:37:59.206000 434971 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3142528Z [rank1]:E1204 13:37:59.206000 434971 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3142676Z [rank1]:E1204 13:37:59.206000 434971 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3142950Z [rank1]:E1204 13:37:59.206000 434971 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.3143086Z [rank1]:E1204 13:37:59.206000 434971 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.3143363Z [rank1]:E1204 13:37:59.206000 434971 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.3143510Z [rank1]:E1204 13:37:59.206000 434971 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.3143987Z [rank1]:E1204 13:37:59.206000 434971 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 1. CUDA driver allocated memory was 2317352960 and is now 3896508416.
2025-12-04T13:38:32.3144116Z [rank1]:E1204 13:37:59.206000 434971 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3144313Z [rank1]:E1204 13:37:59.206000 434971 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.3144673Z [rank1]:E1204 13:37:59.206000 434971 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.3144787Z [rank1]:E1204 13:37:59.206000 434971 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3145010Z [rank1]:E1204 13:37:59.206000 434971 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.3145174Z [rank1]:E1204 13:37:59.206000 434971 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.3145213Z dist init r=1, world=4
2025-12-04T13:38:32.3145351Z [rank2]:E1204 13:37:59.208000 434972 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.3145510Z [rank2]:E1204 13:37:59.208000 434972 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.3145794Z [rank2]:E1204 13:37:59.208000 434972 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.3145949Z [rank2]:E1204 13:37:59.208000 434972 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.3146244Z [rank2]:E1204 13:37:59.208000 434972 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.3146367Z [rank2]:E1204 13:37:59.208000 434972 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.3146652Z [rank2]:E1204 13:37:59.208000 434972 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3146798Z [rank2]:E1204 13:37:59.208000 434972 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3147075Z [rank2]:E1204 13:37:59.208000 434972 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3147221Z [rank2]:E1204 13:37:59.208000 434972 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3147496Z [rank2]:E1204 13:37:59.208000 434972 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.3147634Z [rank2]:E1204 13:37:59.208000 434972 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.3147909Z [rank2]:E1204 13:37:59.208000 434972 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.3148068Z [rank2]:E1204 13:37:59.208000 434972 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.3148540Z [rank2]:E1204 13:37:59.208000 434972 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 2300575744 and is now 3879731200.
2025-12-04T13:38:32.3148656Z [rank2]:E1204 13:37:59.208000 434972 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3148850Z [rank2]:E1204 13:37:59.208000 434972 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.3149218Z [rank2]:E1204 13:37:59.208000 434972 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.3149332Z [rank2]:E1204 13:37:59.208000 434972 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3149543Z [rank2]:E1204 13:37:59.208000 434972 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.3149748Z [rank2]:E1204 13:37:59.208000 434972 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.3149784Z dist init r=2, world=4
2025-12-04T13:38:32.3149920Z [rank3]:E1204 13:37:59.271000 434973 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.3150081Z [rank3]:E1204 13:37:59.271000 434973 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.3150368Z [rank3]:E1204 13:37:59.271000 434973 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.3150534Z [rank3]:E1204 13:37:59.271000 434973 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.3150817Z [rank3]:E1204 13:37:59.271000 434973 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.3150953Z [rank3]:E1204 13:37:59.271000 434973 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.3151229Z [rank3]:E1204 13:37:59.271000 434973 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3151379Z [rank3]:E1204 13:37:59.271000 434973 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3151658Z [rank3]:E1204 13:37:59.271000 434973 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3151805Z [rank3]:E1204 13:37:59.271000 434973 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3152081Z [rank3]:E1204 13:37:59.271000 434973 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.3152231Z [rank3]:E1204 13:37:59.271000 434973 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.3152508Z [rank3]:E1204 13:37:59.271000 434973 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.3152654Z [rank3]:E1204 13:37:59.271000 434973 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.3153125Z [rank3]:E1204 13:37:59.271000 434973 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 114176 on device 3. CUDA driver allocated memory was 2250244096 and is now 3829399552.
2025-12-04T13:38:32.3153238Z [rank3]:E1204 13:37:59.271000 434973 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3153445Z [rank3]:E1204 13:37:59.271000 434973 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.3153802Z [rank3]:E1204 13:37:59.271000 434973 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.3153914Z [rank3]:E1204 13:37:59.271000 434973 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3154124Z [rank3]:E1204 13:37:59.271000 434973 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.3154286Z [rank3]:E1204 13:37:59.271000 434973 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.3154326Z dist init r=3, world=4
2025-12-04T13:38:32.3154461Z [rank0]:E1204 13:37:59.288000 434970 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.3154624Z [rank0]:E1204 13:37:59.288000 434970 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.3154919Z [rank0]:E1204 13:37:59.288000 434970 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.3155081Z [rank0]:E1204 13:37:59.288000 434970 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.3155364Z [rank0]:E1204 13:37:59.288000 434970 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.3155488Z [rank0]:E1204 13:37:59.288000 434970 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.3155766Z [rank0]:E1204 13:37:59.288000 434970 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3155912Z [rank0]:E1204 13:37:59.288000 434970 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3156190Z [rank0]:E1204 13:37:59.288000 434970 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3156335Z [rank0]:E1204 13:37:59.288000 434970 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3156623Z [rank0]:E1204 13:37:59.288000 434970 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.3156758Z [rank0]:E1204 13:37:59.288000 434970 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.3157034Z [rank0]:E1204 13:37:59.288000 434970 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.3157182Z [rank0]:E1204 13:37:59.288000 434970 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.3157668Z [rank0]:E1204 13:37:59.288000 434970 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 0. CUDA driver allocated memory was 2453667840 and is now 4032823296.
2025-12-04T13:38:32.3157783Z [rank0]:E1204 13:37:59.288000 434970 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3157976Z [rank0]:E1204 13:37:59.288000 434970 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.3158329Z [rank0]:E1204 13:37:59.288000 434970 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.3158442Z [rank0]:E1204 13:37:59.288000 434970 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3158654Z [rank0]:E1204 13:37:59.288000 434970 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.3158816Z [rank0]:E1204 13:37:59.288000 434970 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.3158853Z dist init r=0, world=4
2025-12-04T13:38:32.3159198Z [rank0]:[W1204 13:37:59.676316395 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
2025-12-04T13:38:32.3159246Z FAILED [11.4239s] [100%]
2025-12-04T13:38:32.3159249Z 
2025-12-04T13:38:32.3159305Z =================================== FAILURES ===================================
2025-12-04T13:38:32.3159403Z ____ TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda ____
2025-12-04T13:38:32.3159450Z Traceback (most recent call last):
2025-12-04T13:38:32.3159649Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.3159692Z     self._join_processes(fn)
2025-12-04T13:38:32.3159864Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.3159917Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.3160094Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.3160138Z     raise RuntimeError(error)
2025-12-04T13:38:32.3160218Z RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T13:38:32.3160263Z Traceback (most recent call last):
2025-12-04T13:38:32.3160425Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.3160484Z     getattr(self, test_name)()
2025-12-04T13:38:32.3160642Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.3160675Z     fn()
2025-12-04T13:38:32.3160830Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3160869Z     method(*args, **kwargs)
2025-12-04T13:38:32.3161021Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3161061Z     method(*args, **kwargs)
2025-12-04T13:38:32.3161211Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.3161247Z     with policy():
2025-12-04T13:38:32.3161400Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.3161454Z     raise RuntimeError(msg)
2025-12-04T13:38:32.3161801Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 2300575744 and is now 3879731200.
2025-12-04T13:38:32.3161804Z 
2025-12-04T13:38:32.3161878Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.3162107Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.3162110Z 
2025-12-04T13:38:32.3162196Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.3162198Z 
2025-12-04T13:38:32.3162200Z 
2025-12-04T13:38:32.3162274Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.3162362Z Process 2 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.3162593Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0af34500f77d76fa.xml -
2025-12-04T13:38:32.3162664Z =========================== short test summary info ============================
2025-12-04T13:38:32.3162909Z FAILED [11.4239s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_shard_grad_op_cuda - RuntimeError: Process 2 exited with error code 10 and exception:
2025-12-04T13:38:32.3162966Z Traceback (most recent call last):
2025-12-04T13:38:32.3163131Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.3163173Z     getattr(self, test_name)()
2025-12-04T13:38:32.3163332Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.3163368Z     fn()
2025-12-04T13:38:32.3163519Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3163558Z     method(*args, **kwargs)
2025-12-04T13:38:32.3163710Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3163749Z     method(*args, **kwargs)
2025-12-04T13:38:32.3163901Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.3163937Z     with policy():
2025-12-04T13:38:32.3164089Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.3164128Z     raise RuntimeError(msg)
2025-12-04T13:38:32.3164487Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 2300575744 and is now 3879731200.
2025-12-04T13:38:32.3164490Z 
2025-12-04T13:38:32.3164563Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.3164790Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.3164794Z 
2025-12-04T13:38:32.3164879Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.3164942Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.3165003Z ====================== 1 failed, 32 deselected in 11.58s =======================
2025-12-04T13:38:32.3165040Z Got exit code 1
2025-12-04T13:38:32.3165230Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_shard_grad_op_cuda
2025-12-04T13:38:32.3165359Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T13:38:32.3165550Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a024f07c1aa82907.xml
2025-12-04T13:38:32.3165607Z ============================= test session starts ==============================
2025-12-04T13:38:32.3165719Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.3165762Z cachedir: .pytest_cache
2025-12-04T13:38:32.3165920Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.3165967Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.3166006Z configfile: pytest.ini
2025-12-04T13:38:32.3166172Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.3166243Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.3166296Z stepcurrent: skipping 32 already run items.
2025-12-04T13:38:32.3166338Z Running 1 items in this shard
2025-12-04T13:38:32.3166340Z 
2025-12-04T13:38:32.3166645Z distributed/fsdp/test_fsdp_core.py::TestNoGradCUDA::test_transformer_no_grad_mixed_precision_True_cuda I1204 13:38:03.779000 435303 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 435372
2025-12-04T13:38:32.3166809Z I1204 13:38:03.780000 435303 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 435373
2025-12-04T13:38:32.3166962Z I1204 13:38:03.780000 435303 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 435374
2025-12-04T13:38:32.3167113Z I1204 13:38:03.781000 435303 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 435375
2025-12-04T13:38:32.3167478Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:32.3167527Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:32.3167817Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type:
2025-12-04T13:38:32.3167883Z {<class 'torch.nn.modules.batchnorm.BatchNorm1d'>}
2025-12-04T13:38:32.3167987Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled.
2025-12-04T13:38:32.3168060Z   _warn_on_overridden_mixed_precision(overridden_module_classes)
2025-12-04T13:38:32.3168569Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.3168632Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.3168988Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:32.3169034Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:32.3169338Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type:
2025-12-04T13:38:32.3169400Z {<class 'torch.nn.modules.batchnorm.BatchNorm1d'>}
2025-12-04T13:38:32.3169505Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled.
2025-12-04T13:38:32.3169620Z   _warn_on_overridden_mixed_precision(overridden_module_classes)
2025-12-04T13:38:32.3170117Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.3170177Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.3170533Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:32.3170580Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:32.3170878Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type:
2025-12-04T13:38:32.3170939Z {<class 'torch.nn.modules.batchnorm.BatchNorm1d'>}
2025-12-04T13:38:32.3171053Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled.
2025-12-04T13:38:32.3171124Z   _warn_on_overridden_mixed_precision(overridden_module_classes)
2025-12-04T13:38:32.3171615Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.3171675Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.3172032Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:32.3172078Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:32.3172368Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type:
2025-12-04T13:38:32.3172427Z {<class 'torch.nn.modules.batchnorm.BatchNorm1d'>}
2025-12-04T13:38:32.3172547Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled.
2025-12-04T13:38:32.3172617Z   _warn_on_overridden_mixed_precision(overridden_module_classes)
2025-12-04T13:38:32.3173104Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.3173163Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.3173307Z [rank3]:E1204 13:38:09.408000 435375 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.3173469Z [rank3]:E1204 13:38:09.408000 435375 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.3173777Z [rank3]:E1204 13:38:09.408000 435375 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.3173931Z [rank3]:E1204 13:38:09.408000 435375 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.3174223Z [rank3]:E1204 13:38:09.408000 435375 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.3174349Z [rank3]:E1204 13:38:09.408000 435375 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.3174630Z [rank3]:E1204 13:38:09.408000 435375 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3174782Z [rank3]:E1204 13:38:09.408000 435375 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3175070Z [rank3]:E1204 13:38:09.408000 435375 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3175217Z [rank3]:E1204 13:38:09.408000 435375 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3175502Z [rank3]:E1204 13:38:09.408000 435375 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.3175639Z [rank3]:E1204 13:38:09.408000 435375 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.3175920Z [rank3]:E1204 13:38:09.408000 435375 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.3176068Z [rank3]:E1204 13:38:09.408000 435375 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.3176538Z [rank3]:E1204 13:38:09.408000 435375 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 3. CUDA driver allocated memory was 2250244096 and is now 2973761536.
2025-12-04T13:38:32.3176652Z [rank3]:E1204 13:38:09.408000 435375 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3176849Z [rank3]:E1204 13:38:09.408000 435375 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.3177207Z [rank3]:E1204 13:38:09.408000 435375 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda
2025-12-04T13:38:32.3177319Z [rank3]:E1204 13:38:09.408000 435375 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3177533Z [rank3]:E1204 13:38:09.408000 435375 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.3177696Z [rank3]:E1204 13:38:09.408000 435375 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.3177734Z dist init r=3, world=4
2025-12-04T13:38:32.3177882Z [rank2]:E1204 13:38:09.424000 435374 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.3178042Z [rank2]:E1204 13:38:09.424000 435374 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.3178329Z [rank2]:E1204 13:38:09.424000 435374 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.3178484Z [rank2]:E1204 13:38:09.424000 435374 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.3178768Z [rank2]:E1204 13:38:09.424000 435374 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.3178892Z [rank2]:E1204 13:38:09.424000 435374 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.3179170Z [rank2]:E1204 13:38:09.424000 435374 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3179327Z [rank2]:E1204 13:38:09.424000 435374 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3179646Z [rank2]:E1204 13:38:09.424000 435374 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3179805Z [rank2]:E1204 13:38:09.424000 435374 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3180080Z [rank2]:E1204 13:38:09.424000 435374 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.3180216Z [rank2]:E1204 13:38:09.424000 435374 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.3180496Z [rank2]:E1204 13:38:09.424000 435374 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.3180644Z [rank2]:E1204 13:38:09.424000 435374 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.3181108Z [rank2]:E1204 13:38:09.424000 435374 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 2. CUDA driver allocated memory was 2300575744 and is now 3024093184.
2025-12-04T13:38:32.3181236Z [rank2]:E1204 13:38:09.424000 435374 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3181432Z [rank2]:E1204 13:38:09.424000 435374 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.3181778Z [rank2]:E1204 13:38:09.424000 435374 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda
2025-12-04T13:38:32.3181892Z [rank2]:E1204 13:38:09.424000 435374 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3182101Z [rank2]:E1204 13:38:09.424000 435374 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.3182278Z [rank2]:E1204 13:38:09.424000 435374 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.3182315Z dist init r=2, world=4
2025-12-04T13:38:32.3182453Z [rank1]:E1204 13:38:09.435000 435373 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.3182610Z [rank1]:E1204 13:38:09.435000 435373 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.3182897Z [rank1]:E1204 13:38:09.435000 435373 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.3183049Z [rank1]:E1204 13:38:09.435000 435373 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.3183334Z [rank1]:E1204 13:38:09.435000 435373 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.3183457Z [rank1]:E1204 13:38:09.435000 435373 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.3183743Z [rank1]:E1204 13:38:09.435000 435373 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3183900Z [rank1]:E1204 13:38:09.435000 435373 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3184176Z [rank1]:E1204 13:38:09.435000 435373 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3184325Z [rank1]:E1204 13:38:09.435000 435373 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3184603Z [rank1]:E1204 13:38:09.435000 435373 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.3184741Z [rank1]:E1204 13:38:09.435000 435373 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.3185018Z [rank1]:E1204 13:38:09.435000 435373 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.3185166Z [rank1]:E1204 13:38:09.435000 435373 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.3185642Z [rank1]:E1204 13:38:09.435000 435373 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 1. CUDA driver allocated memory was 2317352960 and is now 3040870400.
2025-12-04T13:38:32.3185755Z [rank1]:E1204 13:38:09.435000 435373 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3185949Z [rank1]:E1204 13:38:09.435000 435373 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.3186296Z [rank1]:E1204 13:38:09.435000 435373 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda
2025-12-04T13:38:32.3186422Z [rank1]:E1204 13:38:09.435000 435373 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3186632Z [rank1]:E1204 13:38:09.435000 435373 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.3186798Z [rank1]:E1204 13:38:09.435000 435373 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.3186835Z dist init r=1, world=4
2025-12-04T13:38:32.3186972Z [rank0]:E1204 13:38:09.459000 435372 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.3187133Z [rank0]:E1204 13:38:09.459000 435372 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.3187420Z [rank0]:E1204 13:38:09.459000 435372 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.3187574Z [rank0]:E1204 13:38:09.459000 435372 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.3187868Z [rank0]:E1204 13:38:09.459000 435372 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.3187991Z [rank0]:E1204 13:38:09.459000 435372 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.3188283Z [rank0]:E1204 13:38:09.459000 435372 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3188428Z [rank0]:E1204 13:38:09.459000 435372 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3188710Z [rank0]:E1204 13:38:09.459000 435372 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3188858Z [rank0]:E1204 13:38:09.459000 435372 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3189134Z [rank0]:E1204 13:38:09.459000 435372 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.3189268Z [rank0]:E1204 13:38:09.459000 435372 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.3189548Z [rank0]:E1204 13:38:09.459000 435372 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.3189740Z [rank0]:E1204 13:38:09.459000 435372 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.3190204Z [rank0]:E1204 13:38:09.459000 435372 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 0. CUDA driver allocated memory was 2453667840 and is now 3177185280.
2025-12-04T13:38:32.3190319Z [rank0]:E1204 13:38:09.459000 435372 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3190512Z [rank0]:E1204 13:38:09.459000 435372 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.3190872Z [rank0]:E1204 13:38:09.459000 435372 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda
2025-12-04T13:38:32.3190986Z [rank0]:E1204 13:38:09.459000 435372 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3191200Z [rank0]:E1204 13:38:09.459000 435372 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.3191366Z [rank0]:E1204 13:38:09.459000 435372 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.3191403Z dist init r=0, world=4
2025-12-04T13:38:32.3191442Z FAILED [6.7165s] [100%]
2025-12-04T13:38:32.3191444Z 
2025-12-04T13:38:32.3191501Z =================================== FAILURES ===================================
2025-12-04T13:38:32.3191598Z ______ TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda _______
2025-12-04T13:38:32.3191643Z Traceback (most recent call last):
2025-12-04T13:38:32.3191805Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.3191860Z     self._join_processes(fn)
2025-12-04T13:38:32.3192033Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.3192099Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.3192278Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.3192321Z     raise RuntimeError(error)
2025-12-04T13:38:32.3192400Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.3192444Z Traceback (most recent call last):
2025-12-04T13:38:32.3192606Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.3192647Z     getattr(self, test_name)()
2025-12-04T13:38:32.3192807Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.3192841Z     fn()
2025-12-04T13:38:32.3192993Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3193033Z     method(*args, **kwargs)
2025-12-04T13:38:32.3193187Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3193226Z     method(*args, **kwargs)
2025-12-04T13:38:32.3193378Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.3193426Z     with policy():
2025-12-04T13:38:32.3193580Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.3193620Z     raise RuntimeError(msg)
2025-12-04T13:38:32.3193961Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 3. CUDA driver allocated memory was 2250244096 and is now 2973761536.
2025-12-04T13:38:32.3193965Z 
2025-12-04T13:38:32.3194040Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.3194258Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda
2025-12-04T13:38:32.3194260Z 
2025-12-04T13:38:32.3194349Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.3194351Z 
2025-12-04T13:38:32.3194353Z 
2025-12-04T13:38:32.3194438Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.3194525Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.3194760Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a024f07c1aa82907.xml -
2025-12-04T13:38:32.3194822Z =========================== short test summary info ============================
2025-12-04T13:38:32.3195058Z FAILED [6.7165s] distributed/fsdp/test_fsdp_core.py::TestNoGradCUDA::test_transformer_no_grad_mixed_precision_True_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.3195102Z Traceback (most recent call last):
2025-12-04T13:38:32.3195266Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.3195309Z     getattr(self, test_name)()
2025-12-04T13:38:32.3195471Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.3195505Z     fn()
2025-12-04T13:38:32.3195670Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3195708Z     method(*args, **kwargs)
2025-12-04T13:38:32.3195859Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3195908Z     method(*args, **kwargs)
2025-12-04T13:38:32.3196058Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.3196093Z     with policy():
2025-12-04T13:38:32.3196246Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.3196286Z     raise RuntimeError(msg)
2025-12-04T13:38:32.3196627Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 3. CUDA driver allocated memory was 2250244096 and is now 2973761536.
2025-12-04T13:38:32.3196630Z 
2025-12-04T13:38:32.3196702Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.3196921Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda
2025-12-04T13:38:32.3196924Z 
2025-12-04T13:38:32.3197011Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.3197073Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.3197147Z ======================= 1 failed, 32 deselected in 6.88s =======================
2025-12-04T13:38:32.3197185Z Got exit code 1
2025-12-04T13:38:32.3197225Z Retrying single test...
2025-12-04T13:38:32.3197413Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-f72400cd29f545d5.xml
2025-12-04T13:38:32.3197474Z ============================= test session starts ==============================
2025-12-04T13:38:32.3197586Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.3197628Z cachedir: .pytest_cache
2025-12-04T13:38:32.3197787Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.3197833Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.3197871Z configfile: pytest.ini
2025-12-04T13:38:32.3198035Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.3198121Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.3198332Z stepcurrent: skipping 32 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestNoGradCUDA::test_transformer_no_grad_mixed_precision_True_cuda
2025-12-04T13:38:32.3198374Z Running 1 items in this shard
2025-12-04T13:38:32.3198376Z 
2025-12-04T13:38:32.3198670Z distributed/fsdp/test_fsdp_core.py::TestNoGradCUDA::test_transformer_no_grad_mixed_precision_True_cuda I1204 13:38:13.063000 435697 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 435766
2025-12-04T13:38:32.3198826Z I1204 13:38:13.063000 435697 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 435767
2025-12-04T13:38:32.3198977Z I1204 13:38:13.064000 435697 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 435768
2025-12-04T13:38:32.3199130Z I1204 13:38:13.064000 435697 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 435769
2025-12-04T13:38:32.3199499Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:32.3199547Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:32.3199871Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type:
2025-12-04T13:38:32.3199950Z {<class 'torch.nn.modules.batchnorm.BatchNorm1d'>}
2025-12-04T13:38:32.3200052Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled.
2025-12-04T13:38:32.3200126Z   _warn_on_overridden_mixed_precision(overridden_module_classes)
2025-12-04T13:38:32.3200619Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.3200678Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.3201035Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:32.3201081Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:32.3201370Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type:
2025-12-04T13:38:32.3201442Z {<class 'torch.nn.modules.batchnorm.BatchNorm1d'>}
2025-12-04T13:38:32.3201546Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled.
2025-12-04T13:38:32.3201616Z   _warn_on_overridden_mixed_precision(overridden_module_classes)
2025-12-04T13:38:32.3202104Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.3202164Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.3202531Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:32.3202579Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:32.3202865Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type:
2025-12-04T13:38:32.3202925Z {<class 'torch.nn.modules.batchnorm.BatchNorm1d'>}
2025-12-04T13:38:32.3203027Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled.
2025-12-04T13:38:32.3203099Z   _warn_on_overridden_mixed_precision(overridden_module_classes)
2025-12-04T13:38:32.3203586Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.3203646Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.3204017Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:32.3204071Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:32.3204355Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type:
2025-12-04T13:38:32.3204415Z {<class 'torch.nn.modules.batchnorm.BatchNorm1d'>}
2025-12-04T13:38:32.3204519Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled.
2025-12-04T13:38:32.3204589Z   _warn_on_overridden_mixed_precision(overridden_module_classes)
2025-12-04T13:38:32.3205080Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.3205137Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.3205280Z [rank2]:E1204 13:38:18.652000 435768 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.3205441Z [rank2]:E1204 13:38:18.652000 435768 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.3205741Z [rank2]:E1204 13:38:18.652000 435768 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.3205898Z [rank2]:E1204 13:38:18.652000 435768 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.3206182Z [rank2]:E1204 13:38:18.652000 435768 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.3206308Z [rank2]:E1204 13:38:18.652000 435768 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.3206584Z [rank2]:E1204 13:38:18.652000 435768 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3206743Z [rank2]:E1204 13:38:18.652000 435768 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3207020Z [rank2]:E1204 13:38:18.652000 435768 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3207168Z [rank2]:E1204 13:38:18.652000 435768 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3207444Z [rank2]:E1204 13:38:18.652000 435768 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.3207579Z [rank2]:E1204 13:38:18.652000 435768 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.3207857Z [rank2]:E1204 13:38:18.652000 435768 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.3208015Z [rank2]:E1204 13:38:18.652000 435768 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.3208481Z [rank2]:E1204 13:38:18.652000 435768 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 2. CUDA driver allocated memory was 2300575744 and is now 3024093184.
2025-12-04T13:38:32.3208608Z [rank2]:E1204 13:38:18.652000 435768 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3208804Z [rank2]:E1204 13:38:18.652000 435768 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.3209150Z [rank2]:E1204 13:38:18.652000 435768 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda
2025-12-04T13:38:32.3209262Z [rank2]:E1204 13:38:18.652000 435768 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3209478Z [rank2]:E1204 13:38:18.652000 435768 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.3209679Z [rank2]:E1204 13:38:18.652000 435768 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.3209733Z dist init r=2, world=4
2025-12-04T13:38:32.3209871Z [rank1]:E1204 13:38:18.653000 435767 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.3210031Z [rank1]:E1204 13:38:18.653000 435767 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.3210317Z [rank1]:E1204 13:38:18.653000 435767 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.3210470Z [rank1]:E1204 13:38:18.653000 435767 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.3210755Z [rank1]:E1204 13:38:18.653000 435767 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.3210891Z [rank1]:E1204 13:38:18.653000 435767 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.3211170Z [rank1]:E1204 13:38:18.653000 435767 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3211316Z [rank1]:E1204 13:38:18.653000 435767 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3211595Z [rank1]:E1204 13:38:18.653000 435767 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3211744Z [rank1]:E1204 13:38:18.653000 435767 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3212020Z [rank1]:E1204 13:38:18.653000 435767 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.3212157Z [rank1]:E1204 13:38:18.653000 435767 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.3212444Z [rank1]:E1204 13:38:18.653000 435767 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.3212603Z [rank1]:E1204 13:38:18.653000 435767 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.3213066Z [rank1]:E1204 13:38:18.653000 435767 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 1. CUDA driver allocated memory was 2317352960 and is now 3040870400.
2025-12-04T13:38:32.3213180Z [rank1]:E1204 13:38:18.653000 435767 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3213376Z [rank1]:E1204 13:38:18.653000 435767 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.3213719Z [rank1]:E1204 13:38:18.653000 435767 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda
2025-12-04T13:38:32.3213834Z [rank1]:E1204 13:38:18.653000 435767 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3214046Z [rank1]:E1204 13:38:18.653000 435767 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.3214223Z [rank1]:E1204 13:38:18.653000 435767 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.3214262Z dist init r=1, world=4
2025-12-04T13:38:32.3214399Z [rank3]:E1204 13:38:18.698000 435769 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.3214558Z [rank3]:E1204 13:38:18.698000 435769 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.3214848Z [rank3]:E1204 13:38:18.698000 435769 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.3215002Z [rank3]:E1204 13:38:18.698000 435769 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.3215297Z [rank3]:E1204 13:38:18.698000 435769 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.3215421Z [rank3]:E1204 13:38:18.698000 435769 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.3215699Z [rank3]:E1204 13:38:18.698000 435769 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3215846Z [rank3]:E1204 13:38:18.698000 435769 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3216120Z [rank3]:E1204 13:38:18.698000 435769 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3216269Z [rank3]:E1204 13:38:18.698000 435769 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3216557Z [rank3]:E1204 13:38:18.698000 435769 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.3216692Z [rank3]:E1204 13:38:18.698000 435769 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.3216983Z [rank3]:E1204 13:38:18.698000 435769 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.3217129Z [rank3]:E1204 13:38:18.698000 435769 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.3217595Z [rank3]:E1204 13:38:18.698000 435769 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 3. CUDA driver allocated memory was 2250244096 and is now 2973761536.
2025-12-04T13:38:32.3217707Z [rank3]:E1204 13:38:18.698000 435769 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3217902Z [rank3]:E1204 13:38:18.698000 435769 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.3218243Z [rank3]:E1204 13:38:18.698000 435769 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda
2025-12-04T13:38:32.3218366Z [rank3]:E1204 13:38:18.698000 435769 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3218577Z [rank3]:E1204 13:38:18.698000 435769 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.3218741Z [rank3]:E1204 13:38:18.698000 435769 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.3218779Z dist init r=3, world=4
2025-12-04T13:38:32.3218916Z [rank0]:E1204 13:38:18.704000 435766 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.3219076Z [rank0]:E1204 13:38:18.704000 435766 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.3219378Z [rank0]:E1204 13:38:18.704000 435766 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.3219532Z [rank0]:E1204 13:38:18.704000 435766 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.3219860Z [rank0]:E1204 13:38:18.704000 435766 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.3219983Z [rank0]:E1204 13:38:18.704000 435766 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.3220258Z [rank0]:E1204 13:38:18.704000 435766 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3220406Z [rank0]:E1204 13:38:18.704000 435766 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3220683Z [rank0]:E1204 13:38:18.704000 435766 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3220842Z [rank0]:E1204 13:38:18.704000 435766 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3221118Z [rank0]:E1204 13:38:18.704000 435766 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.3221270Z [rank0]:E1204 13:38:18.704000 435766 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.3221547Z [rank0]:E1204 13:38:18.704000 435766 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.3221696Z [rank0]:E1204 13:38:18.704000 435766 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.3222156Z [rank0]:E1204 13:38:18.704000 435766 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 0. CUDA driver allocated memory was 2453667840 and is now 3177185280.
2025-12-04T13:38:32.3222271Z [rank0]:E1204 13:38:18.704000 435766 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3222464Z [rank0]:E1204 13:38:18.704000 435766 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.3222819Z [rank0]:E1204 13:38:18.704000 435766 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda
2025-12-04T13:38:32.3222933Z [rank0]:E1204 13:38:18.704000 435766 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3223143Z [rank0]:E1204 13:38:18.704000 435766 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.3223309Z [rank0]:E1204 13:38:18.704000 435766 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.3223346Z dist init r=0, world=4
2025-12-04T13:38:32.3223384Z FAILED [6.6161s] [100%]
2025-12-04T13:38:32.3223386Z 
2025-12-04T13:38:32.3223442Z =================================== FAILURES ===================================
2025-12-04T13:38:32.3223547Z ______ TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda _______
2025-12-04T13:38:32.3223593Z Traceback (most recent call last):
2025-12-04T13:38:32.3223757Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.3223800Z     self._join_processes(fn)
2025-12-04T13:38:32.3223973Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.3224026Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.3224203Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.3224245Z     raise RuntimeError(error)
2025-12-04T13:38:32.3224322Z RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T13:38:32.3224367Z Traceback (most recent call last):
2025-12-04T13:38:32.3224528Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.3224570Z     getattr(self, test_name)()
2025-12-04T13:38:32.3224738Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.3224773Z     fn()
2025-12-04T13:38:32.3224923Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3224975Z     method(*args, **kwargs)
2025-12-04T13:38:32.3225124Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3225164Z     method(*args, **kwargs)
2025-12-04T13:38:32.3225313Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.3225350Z     with policy():
2025-12-04T13:38:32.3225502Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.3225543Z     raise RuntimeError(msg)
2025-12-04T13:38:32.3225884Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 1. CUDA driver allocated memory was 2317352960 and is now 3040870400.
2025-12-04T13:38:32.3225887Z 
2025-12-04T13:38:32.3225962Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.3226178Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda
2025-12-04T13:38:32.3226180Z 
2025-12-04T13:38:32.3226278Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.3226280Z 
2025-12-04T13:38:32.3226339Z Process 2 exited with error code 10 and exception:
2025-12-04T13:38:32.3226383Z Traceback (most recent call last):
2025-12-04T13:38:32.3226545Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.3226586Z     getattr(self, test_name)()
2025-12-04T13:38:32.3226744Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.3226779Z     fn()
2025-12-04T13:38:32.3226931Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3226969Z     method(*args, **kwargs)
2025-12-04T13:38:32.3227119Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3227157Z     method(*args, **kwargs)
2025-12-04T13:38:32.3227320Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.3227356Z     with policy():
2025-12-04T13:38:32.3227508Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.3227548Z     raise RuntimeError(msg)
2025-12-04T13:38:32.3227888Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 2. CUDA driver allocated memory was 2300575744 and is now 3024093184.
2025-12-04T13:38:32.3227891Z 
2025-12-04T13:38:32.3227963Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.3228178Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda
2025-12-04T13:38:32.3228181Z 
2025-12-04T13:38:32.3228268Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.3228270Z 
2025-12-04T13:38:32.3228272Z 
2025-12-04T13:38:32.3228346Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.3228443Z Process 1 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.3228676Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-f72400cd29f545d5.xml -
2025-12-04T13:38:32.3228746Z =========================== short test summary info ============================
2025-12-04T13:38:32.3228979Z FAILED [6.6161s] distributed/fsdp/test_fsdp_core.py::TestNoGradCUDA::test_transformer_no_grad_mixed_precision_True_cuda - RuntimeError: Process 1 exited with error code 10 and exception:
2025-12-04T13:38:32.3229025Z Traceback (most recent call last):
2025-12-04T13:38:32.3229190Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.3229232Z     getattr(self, test_name)()
2025-12-04T13:38:32.3229392Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.3229427Z     fn()
2025-12-04T13:38:32.3229684Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3229725Z     method(*args, **kwargs)
2025-12-04T13:38:32.3229876Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3229915Z     method(*args, **kwargs)
2025-12-04T13:38:32.3230066Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.3230120Z     with policy():
2025-12-04T13:38:32.3230274Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.3230313Z     raise RuntimeError(msg)
2025-12-04T13:38:32.3230657Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 1. CUDA driver allocated memory was 2317352960 and is now 3040870400.
2025-12-04T13:38:32.3230660Z 
2025-12-04T13:38:32.3230731Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.3230946Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda
2025-12-04T13:38:32.3230948Z 
2025-12-04T13:38:32.3231033Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.3231037Z 
2025-12-04T13:38:32.3231108Z Process 2 exited with error code 10 and exception:
2025-12-04T13:38:32.3231151Z Traceback (most recent call last):
2025-12-04T13:38:32.3231313Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.3231353Z     getattr(self, test_name)()
2025-12-04T13:38:32.3231513Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.3231548Z     fn()
2025-12-04T13:38:32.3231697Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3231736Z     method(*args, **kwargs)
2025-12-04T13:38:32.3231884Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3231924Z     method(*args, **kwargs)
2025-12-04T13:38:32.3232074Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.3232109Z     with policy():
2025-12-04T13:38:32.3232260Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.3232300Z     raise RuntimeError(msg)
2025-12-04T13:38:32.3232653Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 2. CUDA driver allocated memory was 2300575744 and is now 3024093184.
2025-12-04T13:38:32.3232669Z 
2025-12-04T13:38:32.3232742Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.3232956Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda
2025-12-04T13:38:32.3232959Z 
2025-12-04T13:38:32.3233045Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.3233108Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.3233169Z ======================= 1 failed, 32 deselected in 6.78s =======================
2025-12-04T13:38:32.3233205Z Got exit code 1
2025-12-04T13:38:32.3233245Z Retrying single test...
2025-12-04T13:38:32.3233434Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-bafa624f55ba28c7.xml
2025-12-04T13:38:32.3233491Z ============================= test session starts ==============================
2025-12-04T13:38:32.3233603Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.3233643Z cachedir: .pytest_cache
2025-12-04T13:38:32.3233803Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.3233869Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.3233909Z configfile: pytest.ini
2025-12-04T13:38:32.3234071Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.3234145Z collecting ... collected 60 items / 32 deselected / 28 selected
2025-12-04T13:38:32.3234357Z stepcurrent: skipping 32 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestNoGradCUDA::test_transformer_no_grad_mixed_precision_True_cuda
2025-12-04T13:38:32.3234401Z Running 1 items in this shard
2025-12-04T13:38:32.3234403Z 
2025-12-04T13:38:32.3234695Z distributed/fsdp/test_fsdp_core.py::TestNoGradCUDA::test_transformer_no_grad_mixed_precision_True_cuda I1204 13:38:22.210000 436091 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 436160
2025-12-04T13:38:32.3234862Z I1204 13:38:22.211000 436091 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 436161
2025-12-04T13:38:32.3235014Z I1204 13:38:22.211000 436091 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 436162
2025-12-04T13:38:32.3235164Z I1204 13:38:22.212000 436091 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 436163
2025-12-04T13:38:32.3235523Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:32.3235570Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:32.3235862Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type:
2025-12-04T13:38:32.3235925Z {<class 'torch.nn.modules.batchnorm.BatchNorm1d'>}
2025-12-04T13:38:32.3236029Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled.
2025-12-04T13:38:32.3236101Z   _warn_on_overridden_mixed_precision(overridden_module_classes)
2025-12-04T13:38:32.3236609Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.3236682Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.3237037Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:32.3237085Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:32.3237373Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type:
2025-12-04T13:38:32.3237435Z {<class 'torch.nn.modules.batchnorm.BatchNorm1d'>}
2025-12-04T13:38:32.3237537Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled.
2025-12-04T13:38:32.3237609Z   _warn_on_overridden_mixed_precision(overridden_module_classes)
2025-12-04T13:38:32.3238095Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.3238167Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.3238519Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:32.3238565Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:32.3238850Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type:
2025-12-04T13:38:32.3238909Z {<class 'torch.nn.modules.batchnorm.BatchNorm1d'>}
2025-12-04T13:38:32.3239011Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled.
2025-12-04T13:38:32.3239093Z   _warn_on_overridden_mixed_precision(overridden_module_classes)
2025-12-04T13:38:32.3239622Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.3239680Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.3240037Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)
2025-12-04T13:38:32.3240081Z   self.encoder = TransformerEncoder(
2025-12-04T13:38:32.3240371Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type:
2025-12-04T13:38:32.3240431Z {<class 'torch.nn.modules.batchnorm.BatchNorm1d'>}
2025-12-04T13:38:32.3240545Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled.
2025-12-04T13:38:32.3240616Z   _warn_on_overridden_mixed_precision(overridden_module_classes)
2025-12-04T13:38:32.3241101Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument.
2025-12-04T13:38:32.3241172Z   device_from_device_id = _get_device_from_device_id(
2025-12-04T13:38:32.3241317Z [rank3]:E1204 13:38:27.865000 436163 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.3241478Z [rank3]:E1204 13:38:27.865000 436163 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.3241770Z [rank3]:E1204 13:38:27.865000 436163 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.3241928Z [rank3]:E1204 13:38:27.865000 436163 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.3242215Z [rank3]:E1204 13:38:27.865000 436163 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.3242352Z [rank3]:E1204 13:38:27.865000 436163 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.3242630Z [rank3]:E1204 13:38:27.865000 436163 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3242778Z [rank3]:E1204 13:38:27.865000 436163 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3243058Z [rank3]:E1204 13:38:27.865000 436163 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3243204Z [rank3]:E1204 13:38:27.865000 436163 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3243491Z [rank3]:E1204 13:38:27.865000 436163 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.3243628Z [rank3]:E1204 13:38:27.865000 436163 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.3243904Z [rank3]:E1204 13:38:27.865000 436163 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.3244053Z [rank3]:E1204 13:38:27.865000 436163 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.3244519Z [rank3]:E1204 13:38:27.865000 436163 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 3. CUDA driver allocated memory was 2250244096 and is now 2973761536.
2025-12-04T13:38:32.3244636Z [rank3]:E1204 13:38:27.865000 436163 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3244841Z [rank3]:E1204 13:38:27.865000 436163 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.3245188Z [rank3]:E1204 13:38:27.865000 436163 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda
2025-12-04T13:38:32.3245312Z [rank3]:E1204 13:38:27.865000 436163 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3245522Z [rank3]:E1204 13:38:27.865000 436163 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.3245688Z [rank3]:E1204 13:38:27.865000 436163 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 3 with exit code: 10
2025-12-04T13:38:32.3245726Z dist init r=3, world=4
2025-12-04T13:38:32.3245864Z [rank0]:E1204 13:38:27.883000 436160 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.3246022Z [rank0]:E1204 13:38:27.883000 436160 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.3246311Z [rank0]:E1204 13:38:27.883000 436160 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.3246464Z [rank0]:E1204 13:38:27.883000 436160 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.3246767Z [rank0]:E1204 13:38:27.883000 436160 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.3246892Z [rank0]:E1204 13:38:27.883000 436160 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.3247169Z [rank0]:E1204 13:38:27.883000 436160 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3247317Z [rank0]:E1204 13:38:27.883000 436160 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3247592Z [rank0]:E1204 13:38:27.883000 436160 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3247751Z [rank0]:E1204 13:38:27.883000 436160 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3248025Z [rank0]:E1204 13:38:27.883000 436160 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.3248161Z [rank0]:E1204 13:38:27.883000 436160 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.3248439Z [rank0]:E1204 13:38:27.883000 436160 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.3248586Z [rank0]:E1204 13:38:27.883000 436160 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.3249066Z [rank0]:E1204 13:38:27.883000 436160 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 0. CUDA driver allocated memory was 2453667840 and is now 3177185280.
2025-12-04T13:38:32.3249179Z [rank0]:E1204 13:38:27.883000 436160 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3249392Z [rank0]:E1204 13:38:27.883000 436160 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.3249755Z [rank0]:E1204 13:38:27.883000 436160 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda
2025-12-04T13:38:32.3249870Z [rank0]:E1204 13:38:27.883000 436160 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3250082Z [rank0]:E1204 13:38:27.883000 436160 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.3250246Z [rank0]:E1204 13:38:27.883000 436160 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 0 with exit code: 10
2025-12-04T13:38:32.3250286Z dist init r=0, world=4
2025-12-04T13:38:32.3250422Z [rank2]:E1204 13:38:27.895000 436162 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.3250580Z [rank2]:E1204 13:38:27.895000 436162 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.3250865Z [rank2]:E1204 13:38:27.895000 436162 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.3251033Z [rank2]:E1204 13:38:27.895000 436162 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.3251317Z [rank2]:E1204 13:38:27.895000 436162 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.3251440Z [rank2]:E1204 13:38:27.895000 436162 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.3251714Z [rank2]:E1204 13:38:27.895000 436162 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3251872Z [rank2]:E1204 13:38:27.895000 436162 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3252150Z [rank2]:E1204 13:38:27.895000 436162 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3252296Z [rank2]:E1204 13:38:27.895000 436162 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3252572Z [rank2]:E1204 13:38:27.895000 436162 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.3252707Z [rank2]:E1204 13:38:27.895000 436162 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.3252985Z [rank2]:E1204 13:38:27.895000 436162 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.3253135Z [rank2]:E1204 13:38:27.895000 436162 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.3253611Z [rank2]:E1204 13:38:27.895000 436162 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 2. CUDA driver allocated memory was 2300575744 and is now 3024093184.
2025-12-04T13:38:32.3253736Z [rank2]:E1204 13:38:27.895000 436162 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3253930Z [rank2]:E1204 13:38:27.895000 436162 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.3254272Z [rank2]:E1204 13:38:27.895000 436162 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda
2025-12-04T13:38:32.3254384Z [rank2]:E1204 13:38:27.895000 436162 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3254595Z [rank2]:E1204 13:38:27.895000 436162 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.3254759Z [rank2]:E1204 13:38:27.895000 436162 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 2 with exit code: 10
2025-12-04T13:38:32.3254797Z dist init r=2, world=4
2025-12-04T13:38:32.3254933Z [rank1]:E1204 13:38:27.924000 436161 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 
2025-12-04T13:38:32.3255104Z [rank1]:E1204 13:38:27.924000 436161 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last):
2025-12-04T13:38:32.3255391Z [rank1]:E1204 13:38:27.924000 436161 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.3255543Z [rank1]:E1204 13:38:27.924000 436161 site-packages/torch/testing/_internal/common_distributed.py:935]     getattr(self, test_name)()
2025-12-04T13:38:32.3255829Z [rank1]:E1204 13:38:27.924000 436161 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.3255951Z [rank1]:E1204 13:38:27.924000 436161 site-packages/torch/testing/_internal/common_distributed.py:935]     fn()
2025-12-04T13:38:32.3256239Z [rank1]:E1204 13:38:27.924000 436161 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3256386Z [rank1]:E1204 13:38:27.924000 436161 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3256663Z [rank1]:E1204 13:38:27.924000 436161 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3256810Z [rank1]:E1204 13:38:27.924000 436161 site-packages/torch/testing/_internal/common_distributed.py:935]     method(*args, **kwargs)
2025-12-04T13:38:32.3257085Z [rank1]:E1204 13:38:27.924000 436161 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.3257222Z [rank1]:E1204 13:38:27.924000 436161 site-packages/torch/testing/_internal/common_distributed.py:935]     with policy():
2025-12-04T13:38:32.3257507Z [rank1]:E1204 13:38:27.924000 436161 site-packages/torch/testing/_internal/common_distributed.py:935]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.3257654Z [rank1]:E1204 13:38:27.924000 436161 site-packages/torch/testing/_internal/common_distributed.py:935]     raise RuntimeError(msg)
2025-12-04T13:38:32.3258129Z [rank1]:E1204 13:38:27.924000 436161 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 1. CUDA driver allocated memory was 2317352960 and is now 3040870400.
2025-12-04T13:38:32.3258244Z [rank1]:E1204 13:38:27.924000 436161 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3258438Z [rank1]:E1204 13:38:27.924000 436161 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.3258778Z [rank1]:E1204 13:38:27.924000 436161 site-packages/torch/testing/_internal/common_distributed.py:935]     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda
2025-12-04T13:38:32.3258891Z [rank1]:E1204 13:38:27.924000 436161 site-packages/torch/testing/_internal/common_distributed.py:935] 
2025-12-04T13:38:32.3259100Z [rank1]:E1204 13:38:27.924000 436161 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.3259276Z [rank1]:E1204 13:38:27.924000 436161 site-packages/torch/testing/_internal/common_distributed.py:935]  exiting process 1 with exit code: 10
2025-12-04T13:38:32.3259314Z dist init r=1, world=4
2025-12-04T13:38:32.3259352Z FAILED [6.6151s] [100%]
2025-12-04T13:38:32.3259354Z 
2025-12-04T13:38:32.3259409Z =================================== FAILURES ===================================
2025-12-04T13:38:32.3259503Z ______ TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda _______
2025-12-04T13:38:32.3259549Z Traceback (most recent call last):
2025-12-04T13:38:32.3259751Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper
2025-12-04T13:38:32.3259794Z     self._join_processes(fn)
2025-12-04T13:38:32.3259965Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes
2025-12-04T13:38:32.3260019Z     self._check_return_codes(fn, elapsed_time)
2025-12-04T13:38:32.3260210Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes
2025-12-04T13:38:32.3260253Z     raise RuntimeError(error)
2025-12-04T13:38:32.3260331Z RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.3260376Z Traceback (most recent call last):
2025-12-04T13:38:32.3260537Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.3260578Z     getattr(self, test_name)()
2025-12-04T13:38:32.3260736Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.3260770Z     fn()
2025-12-04T13:38:32.3260921Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3260961Z     method(*args, **kwargs)
2025-12-04T13:38:32.3261113Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3261153Z     method(*args, **kwargs)
2025-12-04T13:38:32.3261304Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.3261341Z     with policy():
2025-12-04T13:38:32.3261504Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.3261559Z     raise RuntimeError(msg)
2025-12-04T13:38:32.3261894Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 3. CUDA driver allocated memory was 2250244096 and is now 2973761536.
2025-12-04T13:38:32.3261896Z 
2025-12-04T13:38:32.3261969Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.3262186Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda
2025-12-04T13:38:32.3262189Z 
2025-12-04T13:38:32.3262274Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.3262276Z 
2025-12-04T13:38:32.3262279Z 
2025-12-04T13:38:32.3262353Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T13:38:32.3262438Z Process 3 terminated with exit code 10, terminating remaining processes.
2025-12-04T13:38:32.3262678Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-bafa624f55ba28c7.xml -
2025-12-04T13:38:32.3262738Z =========================== short test summary info ============================
2025-12-04T13:38:32.3262969Z FAILED [6.6151s] distributed/fsdp/test_fsdp_core.py::TestNoGradCUDA::test_transformer_no_grad_mixed_precision_True_cuda - RuntimeError: Process 3 exited with error code 10 and exception:
2025-12-04T13:38:32.3263027Z Traceback (most recent call last):
2025-12-04T13:38:32.3263191Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test
2025-12-04T13:38:32.3263234Z     getattr(self, test_name)()
2025-12-04T13:38:32.3263393Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper
2025-12-04T13:38:32.3263428Z     fn()
2025-12-04T13:38:32.3263579Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3263618Z     method(*args, **kwargs)
2025-12-04T13:38:32.3263766Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T13:38:32.3263806Z     method(*args, **kwargs)
2025-12-04T13:38:32.3263963Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T13:38:32.3264001Z     with policy():
2025-12-04T13:38:32.3264152Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T13:38:32.3264193Z     raise RuntimeError(msg)
2025-12-04T13:38:32.3264528Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 3. CUDA driver allocated memory was 2250244096 and is now 2973761536.
2025-12-04T13:38:32.3264532Z 
2025-12-04T13:38:32.3264606Z To execute this test, run the following from the base repo dir:
2025-12-04T13:38:32.3264825Z     PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_True_cuda
2025-12-04T13:38:32.3264828Z 
2025-12-04T13:38:32.3264915Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T13:38:32.3264978Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T13:38:32.3265036Z ======================= 1 failed, 32 deselected in 6.76s =======================
2025-12-04T13:38:32.3265088Z Got exit code 1
2025-12-04T13:38:32.3265256Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestNoGradCUDA::test_transformer_no_grad_mixed_precision_True_cuda
2025-12-04T13:38:32.3265394Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T13:38:32.3265584Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-b83aefff71d6b746.xml
2025-12-04T13:38:32.3265641Z ============================= test session starts ==============================
2025-12-04T13:38:32.3265755Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T13:38:32.3265796Z cachedir: .pytest_cache
2025-12-04T13:38:32.3265951Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T13:38:32.3265997Z rootdir: /var/lib/jenkins/pytorch
2025-12-04T13:38:32.3266036Z configfile: pytest.ini
2025-12-04T13:38:32.3266200Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T13:38:32.3266273Z collecting ... collected 60 items / 33 deselected / 27 selected
2025-12-04T13:38:32.3266326Z stepcurrent: skipping 33 already run items.
2025-12-04T13:38:32.3266367Z Running 0 items in this shard
2025-12-04T13:38:32.3266369Z 
2025-12-04T13:38:32.3266604Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-b83aefff71d6b746.xml -
2025-12-04T13:38:32.3266672Z ============================ 33 deselected in 0.01s ============================
2025-12-04T13:38:32.3272245Z The following tests failed consistently: ['test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_cuda_first_True_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_False_mixed_precision_False_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_False_mixed_precision_True_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_False_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_no_shard_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_no_shard_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_shard_grad_op_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_no_shard_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_none_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_false_shard_grad_op_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_no_shard_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_none_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_reduce_scatter_offload_true_shard_grad_op_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_no_shard_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_no_shard_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_none_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_true_shard_grad_op_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_no_shard_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_none_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_no_shard_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_shard_grad_op_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_shard_grad_op_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_shard_grad_op_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_no_shard_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_none_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_true_shard_grad_op_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_none_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_shard_grad_op_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestNoGradCUDA::test_transformer_no_grad_mixed_precision_True_cuda']
2025-12-04T13:38:32.3272276Z 
2025-12-04T13:38:32.3272462Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_core 1/2 (test/test-reports/distributed.fsdp.test_fsdp_core_1.2_d5d5bc8f8345486d_.log)
2025-12-04T13:38:32.3272464Z 
2025-12-04T13:38:32.3272587Z Finished distributed/fsdp/test_fsdp_core 1/2 ... [2025-12-04 13:38:31.931630][5233552.910669318], took 41.64min
2025-12-04T13:38:32.3272855Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T13:38:32.3272939Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T13:38:32.3273035Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading
2025-12-04T13:38:32.3273091Z Uploading artifacts took 0.00 seconds
2025-12-04T13:38:32.3273142Z distributed/fsdp/test_fsdp_core 1/2 failed!
2025-12-04T13:38:32.3273249Z Running distributed/test_c10d_spawn_ucc 1/1 ... [2025-12-04 13:38:31.934610][5233552.913651199]
2025-12-04T13:38:32.3273297Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T13:38:32.3273626Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_c10d_spawn_ucc.py', '--shard-id=1', '--num-shards=1', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:38:31.934792]
2025-12-04T13:38:44.6649472Z 
2025-12-04T13:38:44.6650102Z distributed/test_c10d_spawn_ucc 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_c10d_spawn_ucc_1.1_4ecd37b7dc2a6472_.log
2025-12-04T13:38:44.6652316Z Running 6 items in this shard: test/distributed/test_c10d_spawn_ucc.py::TestDistributedNNFunctionsUcc::test_all_gather, test/distributed/test_c10d_spawn_ucc.py::TestDistributedNNFunctionsUcc::test_all_to_all, test/distributed/test_c10d_spawn_ucc.py::TestDistributedNNFunctionsUcc::test_all_to_all_single, test/distributed/test_c10d_spawn_ucc.py::TestDistributedNNFunctionsUcc::test_allreduce, test/distributed/test_c10d_spawn_ucc.py::TestDistributedNNFunctionsUcc::test_broadcast, test/distributed/test_c10d_spawn_ucc.py::TestDistributedNNFunctionsUcc::test_reduce
2025-12-04T13:38:44.6654146Z Running 1 items in this shard: test/distributed/test_c10d_spawn_ucc.py::TestDistributedNNFunctionsUcc::test_all_gather
2025-12-04T13:38:44.6654753Z Running 1 items in this shard: test/distributed/test_c10d_spawn_ucc.py::TestDistributedNNFunctionsUcc::test_all_to_all
2025-12-04T13:38:44.6655377Z Running 1 items in this shard: test/distributed/test_c10d_spawn_ucc.py::TestDistributedNNFunctionsUcc::test_all_to_all_single
2025-12-04T13:38:44.6656004Z Running 1 items in this shard: test/distributed/test_c10d_spawn_ucc.py::TestDistributedNNFunctionsUcc::test_allreduce
2025-12-04T13:38:44.6656597Z Running 1 items in this shard: test/distributed/test_c10d_spawn_ucc.py::TestDistributedNNFunctionsUcc::test_broadcast
2025-12-04T13:38:44.6657179Z Running 1 items in this shard: test/distributed/test_c10d_spawn_ucc.py::TestDistributedNNFunctionsUcc::test_reduce
2025-12-04T13:38:44.6657509Z 
2025-12-04T13:38:44.6657757Z Finished distributed/test_c10d_spawn_ucc 1/1 ... [2025-12-04 13:38:44.664747][5233565.643783775], took 0.21min
2025-12-04T13:38:44.6675090Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T13:38:44.6684633Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T13:38:44.6686758Z Running distributed/test_c10d_gloo 1/1 ... [2025-12-04 13:38:44.668582][5233565.647623846]
2025-12-04T13:38:44.6687106Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T13:38:44.6688958Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_c10d_gloo.py', '--shard-id=1', '--num-shards=1', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:38:44.668770]
2025-12-04T13:58:41.8360937Z 
2025-12-04T13:58:41.8361841Z distributed/test_c10d_gloo 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_c10d_gloo_1.1_598729a4fcd85d87_.log
2025-12-04T13:58:41.8407963Z Running 246 items in this shard: test/distributed/test_c10d_gloo.py::RendezvousTCPTest::test_tcp_init, test/distributed/test_c10d_gloo.py::RendezvousEnvTest::test_logging_init, test/distributed/test_c10d_gloo.py::TimeoutTest::test_default_store_timeout_gloo, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allgather_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allgather_basics_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allgather_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allgather_coalesced_async, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allgather_coalesced_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allgather_inference_mode, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allgather_into_tensor_coalesced, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allgather_noncontiguous_input, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allgather_stress, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allgather_stress_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_basics_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_coalesced_async, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_coalesced_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_coalesced_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_coalesced_checks_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_coalesced_stress, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_op_timeout, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_overall_timeout, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_stress, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_stress_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_barrier_implies_wait, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_block_current_stream_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_broadcast_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_broadcast_basics_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_broadcast_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_broadcast_stress, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_broadcast_stress_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_empty_tensors, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_gather_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_gather_basics_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_gather_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_gather_noncontiguous_input, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_gather_stress, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_gather_stress_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_multi_device_constructor, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_reduce_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_reduce_basics_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_reduce_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_reduce_scatter, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_reduce_scatter_tensor, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_reduce_scatter_tensor_coalesced, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_reduce_stress, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_reduce_stress_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_scatter_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_scatter_basics_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_scatter_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_scatter_stress, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_scatter_stress_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_send_recv_all_to_all, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_send_recv_complex, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_set_gloo_pg_timeout, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_sparse_allreduce_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_sparse_allreduce_basics_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_sparse_allreduce_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_sparse_allreduce_cuda_dispatched, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_dataclass_output, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_dataclass_output_unused_param, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_dynamic_module, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_dynamic_weight_sharing, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_once_use_reentrant_False, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_once_use_reentrant_True, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_twice_static_graph_use_reentrant_False, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_twice_static_graph_use_reentrant_True, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_twice_use_reentrant_False, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_twice_use_reentrant_True, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_twice_weight_sharing, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_unused_params_use_reentrant_False, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_unused_params_use_reentrant_True, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_weight_sharing_use_reentrant_False, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_weight_sharing_use_reentrant_True, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_comm_hook_future_passing_cpu, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_comm_hook_future_passing_gpu_gloo, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_comm_hook_register_just_once, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_comm_hook_sparse_gradients, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_complex_params, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_invalid_comm_hook_init, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_invalid_comm_hook_return_type, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_find_unused_parameters_when_unused_parameters_empty, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_global_local_unused_params_grad, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_global_local_unused_params_grad_with_grad_is_view, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_global_local_unused_params_grad_with_static_graph, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_gloo_backend_1gpu_module_device_ids_integer_list, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_gloo_backend_1gpu_module_device_ids_torch_device_list, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_gloo_backend_2gpu_module, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_gloo_backend_4gpu_module, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_gloo_backend_cpu_module, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_gloo_backend_cpu_module_grad_is_view, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ignored_output, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ignored_output_with_unused_parameters, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ignored_sharded_tensor, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_invalid_powerSGD_state, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_save_load_checkpoint, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_sparse_gradients, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_sparse_gradients_grad_is_view, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_sync_batch_norm_empty_input, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_sync_batch_norm_only_empty_input, test/distributed/test_c10d_gloo.py::ReducerTest::test_forward_backward, test/distributed/test_c10d_gloo.py::ReducerTest::test_forward_backward_optimizer, test/distributed/test_c10d_gloo.py::ReducerTest::test_forward_backward_unused_parameters, test/distributed/test_c10d_gloo.py::ReducerTest::test_multi_dtype_multi_bucket, test/distributed/test_c10d_gloo.py::ReducerTest::test_multi_dtype_single_bucket, test/distributed/test_c10d_gloo.py::ReducerTest::test_single_dtype_single_bucket, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allgather_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allgather_basics_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allgather_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allgather_coalesced_async, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allgather_coalesced_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allgather_inference_mode, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allgather_into_tensor_coalesced, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allgather_noncontiguous_input, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allgather_stress, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allgather_stress_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_basics_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_coalesced_async, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_coalesced_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_coalesced_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_coalesced_checks_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_coalesced_stress, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_op_timeout, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_overall_timeout, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_stress, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_stress_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_barrier_implies_wait, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_block_current_stream_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_broadcast_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_broadcast_basics_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_broadcast_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_broadcast_stress, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_broadcast_stress_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_empty_tensors, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_gather_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_gather_basics_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_gather_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_gather_noncontiguous_input, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_gather_stress, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_gather_stress_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_multi_device_constructor, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_reduce_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_reduce_basics_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_reduce_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_reduce_scatter, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_reduce_scatter_tensor, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_reduce_scatter_tensor_coalesced, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_reduce_stress, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_reduce_stress_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_scatter_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_scatter_basics_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_scatter_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_scatter_stress, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_scatter_stress_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_send_recv_all_to_all, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_send_recv_complex, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_set_gloo_pg_timeout, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_sparse_allreduce_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_sparse_allreduce_basics_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_sparse_allreduce_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_sparse_allreduce_cuda_dispatched, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allgather_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allgather_basics_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allgather_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allgather_coalesced_async, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allgather_coalesced_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allgather_inference_mode, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allgather_into_tensor_coalesced, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allgather_noncontiguous_input, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allgather_stress, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allgather_stress_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_basics_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_coalesced_async, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_coalesced_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_coalesced_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_coalesced_checks_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_coalesced_stress, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_op_timeout, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_overall_timeout, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_stress, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_stress_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_barrier_implies_wait, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_block_current_stream_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_broadcast_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_broadcast_basics_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_broadcast_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_broadcast_stress, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_broadcast_stress_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_empty_tensors, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_gather_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_gather_basics_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_gather_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_gather_noncontiguous_input, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_gather_stress, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_gather_stress_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_long, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_multi_device_constructor, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_reduce_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_reduce_basics_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_reduce_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_reduce_scatter, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_reduce_scatter_tensor, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_reduce_scatter_tensor_coalesced, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_reduce_stress, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_reduce_stress_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_scatter_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_scatter_basics_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_scatter_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_scatter_stress, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_scatter_stress_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_send_recv_all_to_all, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_send_recv_complex, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_set_gloo_pg_timeout, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_short_json, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_short_pickle, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_sparse_allreduce_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_sparse_allreduce_basics_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_sparse_allreduce_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_sparse_allreduce_cuda_dispatched, test/distributed/test_c10d_gloo.py::CommTest::test_bool_tensors, test/distributed/test_c10d_gloo.py::CommTest::test_broadcast_coalesced_gloo_cpu, test/distributed/test_c10d_gloo.py::CommTest::test_broadcast_coalesced_gloo_cuda, test/distributed/test_c10d_gloo.py::CommTest::test_gloo_rank_membership, test/distributed/test_c10d_gloo.py::CommTest::test_gloo_warn_not_in_group, test/distributed/test_c10d_gloo.py::CommTest::test_sequence_num_incremented_gloo_default, test/distributed/test_c10d_gloo.py::CommTest::test_sequence_num_incremented_gloo_subgroup, test/distributed/test_c10d_gloo.py::CommTest::test_sequence_num_set_default_pg_gloo, test/distributed/test_c10d_gloo.py::CommTest::test_sequence_num_set_gloo_new_group, test/distributed/test_c10d_gloo.py::CommTest::test_tensor_dtype_complex, test/distributed/test_c10d_gloo.py::CommTest::test_tensor_dtype_mismatch, test/distributed/test_c10d_gloo.py::GlooProcessGroupWithDispatchedCollectivesTests::test_all_to_all_single, test/distributed/test_c10d_gloo.py::GlooProcessGroupWithDispatchedCollectivesTests::test_allgather_coalesced, test/distributed/test_c10d_gloo.py::GlooProcessGroupWithDispatchedCollectivesTests::test_allreduce_coalesced, test/distributed/test_c10d_gloo.py::GlooProcessGroupWithDispatchedCollectivesTests::test_collectives, test/distributed/test_c10d_gloo.py::GlooProcessGroupWithDispatchedCollectivesTests::test_default_process_group, test/distributed/test_c10d_gloo.py::GlooProcessGroupWithDispatchedCollectivesTests::test_init_process_group_for_all_backends, test/distributed/test_c10d_gloo.py::GlooProcessGroupWithDispatchedCollectivesTests::test_init_process_group_optional_backend, test/distributed/test_c10d_gloo.py::GlooProcessGroupWithDispatchedCollectivesTests::test_monitored_barrier, test/distributed/test_c10d_gloo.py::LargeCommTest::test_new_group_local_sync, test/distributed/test_c10d_gloo.py::LargeCommTest::test_new_group_local_sync_duplicate_pg, test/distributed/test_c10d_gloo.py::LargeCommTest::test_new_group_local_sync_sanity_check
2025-12-04T13:58:41.8439033Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::RendezvousTCPTest::test_tcp_init
2025-12-04T13:58:41.8439321Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::RendezvousEnvTest::test_logging_init
2025-12-04T13:58:41.8439648Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::TimeoutTest::test_default_store_timeout_gloo
2025-12-04T13:58:41.8439947Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allgather_basics
2025-12-04T13:58:41.8440252Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allgather_basics_cuda
2025-12-04T13:58:41.8440557Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allgather_checks
2025-12-04T13:58:41.8440864Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allgather_coalesced_async
2025-12-04T13:58:41.8441186Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allgather_coalesced_checks
2025-12-04T13:58:41.8441506Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allgather_inference_mode
2025-12-04T13:58:41.8441833Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allgather_into_tensor_coalesced
2025-12-04T13:58:41.8442201Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allgather_noncontiguous_input
2025-12-04T13:58:41.8442514Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allgather_stress
2025-12-04T13:58:41.8442814Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allgather_stress_cuda
2025-12-04T13:58:41.8443112Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_basics
2025-12-04T13:58:41.8443412Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_basics_cuda
2025-12-04T13:58:41.8443710Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_checks
2025-12-04T13:58:41.8444015Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_coalesced_async
2025-12-04T13:58:41.8444352Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_coalesced_basics
2025-12-04T13:58:41.8444673Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_coalesced_checks
2025-12-04T13:58:41.8445005Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_coalesced_checks_cuda
2025-12-04T13:58:41.8445332Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_coalesced_stress
2025-12-04T13:58:41.8445662Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_op_timeout
2025-12-04T13:58:41.8445973Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_overall_timeout
2025-12-04T13:58:41.8446280Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_stress
2025-12-04T13:58:41.8446588Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_stress_cuda
2025-12-04T13:58:41.8446892Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_barrier_implies_wait
2025-12-04T13:58:41.8447217Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_block_current_stream_cuda
2025-12-04T13:58:41.8447525Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_broadcast_basics
2025-12-04T13:58:41.8447847Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_broadcast_basics_cuda
2025-12-04T13:58:41.8448145Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_broadcast_checks
2025-12-04T13:58:41.8448440Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_broadcast_stress
2025-12-04T13:58:41.8448746Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_broadcast_stress_cuda
2025-12-04T13:58:41.8449044Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_empty_tensors
2025-12-04T13:58:41.8449333Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_gather_basics
2025-12-04T13:58:41.8449702Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_gather_basics_cuda
2025-12-04T13:58:41.8449998Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_gather_checks
2025-12-04T13:58:41.8450303Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_gather_noncontiguous_input
2025-12-04T13:58:41.8450606Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_gather_stress
2025-12-04T13:58:41.8450918Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_gather_stress_cuda
2025-12-04T13:58:41.8451223Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_multi_device_constructor
2025-12-04T13:58:41.8451523Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_reduce_basics
2025-12-04T13:58:41.8451815Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_reduce_basics_cuda
2025-12-04T13:58:41.8452108Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_reduce_checks
2025-12-04T13:58:41.8452396Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_reduce_scatter
2025-12-04T13:58:41.8452692Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_reduce_scatter_tensor
2025-12-04T13:58:41.8453037Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_reduce_scatter_tensor_coalesced
2025-12-04T13:58:41.8453346Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_reduce_stress
2025-12-04T13:58:41.8453635Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_reduce_stress_cuda
2025-12-04T13:58:41.8453928Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_scatter_basics
2025-12-04T13:58:41.8454220Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_scatter_basics_cuda
2025-12-04T13:58:41.8454514Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_scatter_checks
2025-12-04T13:58:41.8454800Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_scatter_stress
2025-12-04T13:58:41.8455094Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_scatter_stress_cuda
2025-12-04T13:58:41.8455398Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_send_recv_all_to_all
2025-12-04T13:58:41.8455697Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_send_recv_complex
2025-12-04T13:58:41.8456015Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_set_gloo_pg_timeout
2025-12-04T13:58:41.8456324Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_sparse_allreduce_basics
2025-12-04T13:58:41.8456679Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_sparse_allreduce_basics_cuda
2025-12-04T13:58:41.8457030Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_sparse_allreduce_checks
2025-12-04T13:58:41.8457356Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_sparse_allreduce_cuda_dispatched
2025-12-04T13:58:41.8457685Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_dataclass_output
2025-12-04T13:58:41.8458026Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_dataclass_output_unused_param
2025-12-04T13:58:41.8458391Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_dynamic_module
2025-12-04T13:58:41.8458767Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_dynamic_weight_sharing
2025-12-04T13:58:41.8459166Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_once_use_reentrant_False
2025-12-04T13:58:41.8459557Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_once_use_reentrant_True
2025-12-04T13:58:41.8460017Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_twice_static_graph_use_reentrant_False
2025-12-04T13:58:41.8460447Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_twice_static_graph_use_reentrant_True
2025-12-04T13:58:41.8460852Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_twice_use_reentrant_False
2025-12-04T13:58:41.8461243Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_twice_use_reentrant_True
2025-12-04T13:58:41.8461626Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_twice_weight_sharing
2025-12-04T13:58:41.8462020Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_unused_params_use_reentrant_False
2025-12-04T13:58:41.8462453Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_unused_params_use_reentrant_True
2025-12-04T13:58:41.8462869Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_weight_sharing_use_reentrant_False
2025-12-04T13:58:41.8463282Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_weight_sharing_use_reentrant_True
2025-12-04T13:58:41.8463670Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_comm_hook_future_passing_cpu
2025-12-04T13:58:41.8464040Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_comm_hook_future_passing_gpu_gloo
2025-12-04T13:58:41.8464410Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_comm_hook_register_just_once
2025-12-04T13:58:41.8464771Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_comm_hook_sparse_gradients
2025-12-04T13:58:41.8465131Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_complex_params
2025-12-04T13:58:41.8465468Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_invalid_comm_hook_init
2025-12-04T13:58:41.8465844Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_invalid_comm_hook_return_type
2025-12-04T13:58:41.8466233Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_find_unused_parameters_when_unused_parameters_empty
2025-12-04T13:58:41.8466619Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_global_local_unused_params_grad
2025-12-04T13:58:41.8467003Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_global_local_unused_params_grad_with_grad_is_view
2025-12-04T13:58:41.8467409Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_global_local_unused_params_grad_with_static_graph
2025-12-04T13:58:41.8467814Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_gloo_backend_1gpu_module_device_ids_integer_list
2025-12-04T13:58:41.8468226Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_gloo_backend_1gpu_module_device_ids_torch_device_list
2025-12-04T13:58:41.8468607Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_gloo_backend_2gpu_module
2025-12-04T13:58:41.8468945Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_gloo_backend_4gpu_module
2025-12-04T13:58:41.8469298Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_gloo_backend_cpu_module
2025-12-04T13:58:41.8469698Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_gloo_backend_cpu_module_grad_is_view
2025-12-04T13:58:41.8470043Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ignored_output
2025-12-04T13:58:41.8470389Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ignored_output_with_unused_parameters
2025-12-04T13:58:41.8470747Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ignored_sharded_tensor
2025-12-04T13:58:41.8471085Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_invalid_powerSGD_state
2025-12-04T13:58:41.8471436Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_save_load_checkpoint
2025-12-04T13:58:41.8471755Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_sparse_gradients
2025-12-04T13:58:41.8472092Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_sparse_gradients_grad_is_view
2025-12-04T13:58:41.8472446Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_sync_batch_norm_empty_input
2025-12-04T13:58:41.8472805Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_sync_batch_norm_only_empty_input
2025-12-04T13:58:41.8473128Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ReducerTest::test_forward_backward
2025-12-04T13:58:41.8473412Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ReducerTest::test_forward_backward_optimizer
2025-12-04T13:58:41.8473717Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ReducerTest::test_forward_backward_unused_parameters
2025-12-04T13:58:41.8474016Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ReducerTest::test_multi_dtype_multi_bucket
2025-12-04T13:58:41.8474322Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ReducerTest::test_multi_dtype_single_bucket
2025-12-04T13:58:41.8474614Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ReducerTest::test_single_dtype_single_bucket
2025-12-04T13:58:41.8474937Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allgather_basics
2025-12-04T13:58:41.8475266Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allgather_basics_cuda
2025-12-04T13:58:41.8475594Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allgather_checks
2025-12-04T13:58:41.8475928Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allgather_coalesced_async
2025-12-04T13:58:41.8476277Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allgather_coalesced_checks
2025-12-04T13:58:41.8476625Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allgather_inference_mode
2025-12-04T13:58:41.8476979Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allgather_into_tensor_coalesced
2025-12-04T13:58:41.8477344Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allgather_noncontiguous_input
2025-12-04T13:58:41.8477682Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allgather_stress
2025-12-04T13:58:41.8478007Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allgather_stress_cuda
2025-12-04T13:58:41.8478353Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_basics
2025-12-04T13:58:41.8478676Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_basics_cuda
2025-12-04T13:58:41.8478999Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_checks
2025-12-04T13:58:41.8479330Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_coalesced_async
2025-12-04T13:58:41.8479718Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_coalesced_basics
2025-12-04T13:58:41.8480069Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_coalesced_checks
2025-12-04T13:58:41.8480449Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_coalesced_checks_cuda
2025-12-04T13:58:41.8480807Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_coalesced_stress
2025-12-04T13:58:41.8481145Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_op_timeout
2025-12-04T13:58:41.8481483Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_overall_timeout
2025-12-04T13:58:41.8481817Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_stress
2025-12-04T13:58:41.8482143Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_stress_cuda
2025-12-04T13:58:41.8482475Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_barrier_implies_wait
2025-12-04T13:58:41.8482814Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_block_current_stream_cuda
2025-12-04T13:58:41.8483147Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_broadcast_basics
2025-12-04T13:58:41.8483492Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_broadcast_basics_cuda
2025-12-04T13:58:41.8483817Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_broadcast_checks
2025-12-04T13:58:41.8484148Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_broadcast_stress
2025-12-04T13:58:41.8484473Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_broadcast_stress_cuda
2025-12-04T13:58:41.8484794Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_empty_tensors
2025-12-04T13:58:41.8485106Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_gather_basics
2025-12-04T13:58:41.8485423Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_gather_basics_cuda
2025-12-04T13:58:41.8485744Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_gather_checks
2025-12-04T13:58:41.8486081Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_gather_noncontiguous_input
2025-12-04T13:58:41.8486411Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_gather_stress
2025-12-04T13:58:41.8486730Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_gather_stress_cuda
2025-12-04T13:58:41.8487080Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_multi_device_constructor
2025-12-04T13:58:41.8487408Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_reduce_basics
2025-12-04T13:58:41.8487726Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_reduce_basics_cuda
2025-12-04T13:58:41.8488045Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_reduce_checks
2025-12-04T13:58:41.8488363Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_reduce_scatter
2025-12-04T13:58:41.8488689Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_reduce_scatter_tensor
2025-12-04T13:58:41.8489038Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_reduce_scatter_tensor_coalesced
2025-12-04T13:58:41.8489397Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_reduce_stress
2025-12-04T13:58:41.8489748Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_reduce_stress_cuda
2025-12-04T13:58:41.8490067Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_scatter_basics
2025-12-04T13:58:41.8490389Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_scatter_basics_cuda
2025-12-04T13:58:41.8490712Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_scatter_checks
2025-12-04T13:58:41.8491023Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_scatter_stress
2025-12-04T13:58:41.8491341Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_scatter_stress_cuda
2025-12-04T13:58:41.8491671Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_send_recv_all_to_all
2025-12-04T13:58:41.8491999Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_send_recv_complex
2025-12-04T13:58:41.8492342Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_set_gloo_pg_timeout
2025-12-04T13:58:41.8492679Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_sparse_allreduce_basics
2025-12-04T13:58:41.8493048Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_sparse_allreduce_basics_cuda
2025-12-04T13:58:41.8493395Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_sparse_allreduce_checks
2025-12-04T13:58:41.8493751Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_sparse_allreduce_cuda_dispatched
2025-12-04T13:58:41.8494086Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allgather_basics
2025-12-04T13:58:41.8494391Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allgather_basics_cuda
2025-12-04T13:58:41.8494699Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allgather_checks
2025-12-04T13:58:41.8495010Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allgather_coalesced_async
2025-12-04T13:58:41.8495337Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allgather_coalesced_checks
2025-12-04T13:58:41.8495662Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allgather_inference_mode
2025-12-04T13:58:41.8495994Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allgather_into_tensor_coalesced
2025-12-04T13:58:41.8496350Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allgather_noncontiguous_input
2025-12-04T13:58:41.8496669Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allgather_stress
2025-12-04T13:58:41.8496974Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allgather_stress_cuda
2025-12-04T13:58:41.8497299Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_basics
2025-12-04T13:58:41.8497607Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_basics_cuda
2025-12-04T13:58:41.8497912Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_checks
2025-12-04T13:58:41.8498226Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_coalesced_async
2025-12-04T13:58:41.8498568Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_coalesced_basics
2025-12-04T13:58:41.8498894Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_coalesced_checks
2025-12-04T13:58:41.8499229Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_coalesced_checks_cuda
2025-12-04T13:58:41.8499561Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_coalesced_stress
2025-12-04T13:58:41.8499910Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_op_timeout
2025-12-04T13:58:41.8500226Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_overall_timeout
2025-12-04T13:58:41.8500537Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_stress
2025-12-04T13:58:41.8500845Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_stress_cuda
2025-12-04T13:58:41.8501155Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_barrier_implies_wait
2025-12-04T13:58:41.8501489Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_block_current_stream_cuda
2025-12-04T13:58:41.8501799Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_broadcast_basics
2025-12-04T13:58:41.8502122Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_broadcast_basics_cuda
2025-12-04T13:58:41.8502425Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_broadcast_checks
2025-12-04T13:58:41.8502721Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_broadcast_stress
2025-12-04T13:58:41.8503026Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_broadcast_stress_cuda
2025-12-04T13:58:41.8503329Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_empty_tensors
2025-12-04T13:58:41.8503623Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_gather_basics
2025-12-04T13:58:41.8503920Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_gather_basics_cuda
2025-12-04T13:58:41.8504217Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_gather_checks
2025-12-04T13:58:41.8504524Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_gather_noncontiguous_input
2025-12-04T13:58:41.8504834Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_gather_stress
2025-12-04T13:58:41.8505147Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_gather_stress_cuda
2025-12-04T13:58:41.8505435Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_long
2025-12-04T13:58:41.8505731Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_multi_device_constructor
2025-12-04T13:58:41.8506034Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_reduce_basics
2025-12-04T13:58:41.8506331Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_reduce_basics_cuda
2025-12-04T13:58:41.8506626Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_reduce_checks
2025-12-04T13:58:41.8506917Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_reduce_scatter
2025-12-04T13:58:41.8507235Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_reduce_scatter_tensor
2025-12-04T13:58:41.8507560Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_reduce_scatter_tensor_coalesced
2025-12-04T13:58:41.8507874Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_reduce_stress
2025-12-04T13:58:41.8508166Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_reduce_stress_cuda
2025-12-04T13:58:41.8508461Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_scatter_basics
2025-12-04T13:58:41.8508759Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_scatter_basics_cuda
2025-12-04T13:58:41.8509056Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_scatter_checks
2025-12-04T13:58:41.8509351Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_scatter_stress
2025-12-04T13:58:41.8509692Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_scatter_stress_cuda
2025-12-04T13:58:41.8510015Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_send_recv_all_to_all
2025-12-04T13:58:41.8510320Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_send_recv_complex
2025-12-04T13:58:41.8510637Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_set_gloo_pg_timeout
2025-12-04T13:58:41.8510935Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_short_json
2025-12-04T13:58:41.8511221Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_short_pickle
2025-12-04T13:58:41.8511524Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_sparse_allreduce_basics
2025-12-04T13:58:41.8511849Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_sparse_allreduce_basics_cuda
2025-12-04T13:58:41.8512173Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_sparse_allreduce_checks
2025-12-04T13:58:41.8512504Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_sparse_allreduce_cuda_dispatched
2025-12-04T13:58:41.8512806Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::CommTest::test_bool_tensors
2025-12-04T13:58:41.8513079Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::CommTest::test_broadcast_coalesced_gloo_cpu
2025-12-04T13:58:41.8513367Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::CommTest::test_broadcast_coalesced_gloo_cuda
2025-12-04T13:58:41.8513667Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::CommTest::test_gloo_rank_membership
2025-12-04T13:58:41.8513942Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::CommTest::test_gloo_warn_not_in_group
2025-12-04T13:58:41.8514235Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::CommTest::test_sequence_num_incremented_gloo_default
2025-12-04T13:58:41.8514547Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::CommTest::test_sequence_num_incremented_gloo_subgroup
2025-12-04T13:58:41.8514852Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::CommTest::test_sequence_num_set_default_pg_gloo
2025-12-04T13:58:41.8515149Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::CommTest::test_sequence_num_set_gloo_new_group
2025-12-04T13:58:41.8515432Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::CommTest::test_tensor_dtype_complex
2025-12-04T13:58:41.8515707Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::CommTest::test_tensor_dtype_mismatch
2025-12-04T13:58:41.8516059Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::GlooProcessGroupWithDispatchedCollectivesTests::test_all_to_all_single
2025-12-04T13:58:41.8516452Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::GlooProcessGroupWithDispatchedCollectivesTests::test_allgather_coalesced
2025-12-04T13:58:41.8516847Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::GlooProcessGroupWithDispatchedCollectivesTests::test_allreduce_coalesced
2025-12-04T13:58:41.8517230Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::GlooProcessGroupWithDispatchedCollectivesTests::test_collectives
2025-12-04T13:58:41.8517619Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::GlooProcessGroupWithDispatchedCollectivesTests::test_default_process_group
2025-12-04T13:58:41.8518046Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::GlooProcessGroupWithDispatchedCollectivesTests::test_init_process_group_for_all_backends
2025-12-04T13:58:41.8518496Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::GlooProcessGroupWithDispatchedCollectivesTests::test_init_process_group_optional_backend
2025-12-04T13:58:41.8518915Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::GlooProcessGroupWithDispatchedCollectivesTests::test_monitored_barrier
2025-12-04T13:58:41.8519267Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::LargeCommTest::test_new_group_local_sync
2025-12-04T13:58:41.8519615Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::LargeCommTest::test_new_group_local_sync_duplicate_pg
2025-12-04T13:58:41.8520103Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::LargeCommTest::test_new_group_local_sync_sanity_check
2025-12-04T13:58:41.8520278Z 
2025-12-04T13:58:41.8520397Z Finished distributed/test_c10d_gloo 1/1 ... [2025-12-04 13:58:41.840082][5234762.819108178], took 19.95min
2025-12-04T13:58:41.8520819Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T13:58:41.8521210Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T13:58:41.8521424Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading
2025-12-04T13:58:41.8521603Z Uploading artifacts took 0.00 seconds
2025-12-04T13:58:41.8521792Z Running distributed/test_c10d_ops_nccl 1/1 ... [2025-12-04 13:58:41.843827][5234762.822868377]
2025-12-04T13:58:41.8521984Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T13:58:41.8522383Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_c10d_ops_nccl.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:58:41.844047]
2025-12-04T13:58:51.1730537Z 
2025-12-04T13:58:51.1731447Z distributed/test_c10d_ops_nccl 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_c10d_ops_nccl_1.1_9bb7c62b01c00575_.log
2025-12-04T13:58:51.1739957Z Running 30 items in this shard: test/distributed/test_c10d_ops_nccl.py::ProcessGroupNCCLOpTest::test_all_gather_v, test/distributed/test_c10d_ops_nccl.py::ProcessGroupNCCLOpTest::test_allgather_base_basics, test/distributed/test_c10d_ops_nccl.py::ProcessGroupNCCLOpTest::test_allgather_base_ops, test/distributed/test_c10d_ops_nccl.py::ProcessGroupNCCLOpTest::test_allgather_ops, test/distributed/test_c10d_ops_nccl.py::ProcessGroupNCCLOpTest::test_allreduce_float8, test/distributed/test_c10d_ops_nccl.py::ProcessGroupNCCLOpTest::test_allreduce_in_cudagraph, test/distributed/test_c10d_ops_nccl.py::ProcessGroupNCCLOpTest::test_allreduce_ops, test/distributed/test_c10d_ops_nccl.py::ProcessGroupNCCLOpTest::test_alltoall_ops_with_cudafree_race, test/distributed/test_c10d_ops_nccl.py::ProcessGroupNCCLOpTest::test_barrier, test/distributed/test_c10d_ops_nccl.py::ProcessGroupNCCLOpTest::test_broadcast_ops, test/distributed/test_c10d_ops_nccl.py::ProcessGroupNCCLOpTest::test_empty_tensors, test/distributed/test_c10d_ops_nccl.py::ProcessGroupNCCLOpTest::test_gather_checks, test/distributed/test_c10d_ops_nccl.py::ProcessGroupNCCLOpTest::test_gather_ops, test/distributed/test_c10d_ops_nccl.py::ProcessGroupNCCLOpTest::test_gather_stress, test/distributed/test_c10d_ops_nccl.py::ProcessGroupNCCLOpTest::test_nccl_watchdog_cudagraph, test/distributed/test_c10d_ops_nccl.py::ProcessGroupNCCLOpTest::test_reduce_ops, test/distributed/test_c10d_ops_nccl.py::ProcessGroupNCCLOpTest::test_reduce_scatter_base_basics, test/distributed/test_c10d_ops_nccl.py::ProcessGroupNCCLOpTest::test_reduce_scatter_base_ops, test/distributed/test_c10d_ops_nccl.py::ProcessGroupNCCLOpTest::test_reduce_scatter_bfloat16, test/distributed/test_c10d_ops_nccl.py::ProcessGroupNCCLOpTest::test_reduce_scatter_float8, test/distributed/test_c10d_ops_nccl.py::ProcessGroupNCCLOpTest::test_reduce_scatter_ops, test/distributed/test_c10d_ops_nccl.py::ProcessGroupNCCLOpTest::test_reduce_scatter_v, test/distributed/test_c10d_ops_nccl.py::ProcessGroupNCCLOpTest::test_scatter_checks, test/distributed/test_c10d_ops_nccl.py::ProcessGroupNCCLOpTest::test_scatter_ops, test/distributed/test_c10d_ops_nccl.py::ProcessGroupNCCLOpTest::test_scatter_stress, test/distributed/test_c10d_ops_nccl.py::ProcessGroupNCCLOpTest::test_send_recv, test/distributed/test_c10d_ops_nccl.py::ProcessGroupNCCLOpTest::test_send_recv_complex, test/distributed/test_c10d_ops_nccl.py::ProcessGroupNCCLOpTest::test_send_recv_object_list, test/distributed/test_c10d_ops_nccl.py::ProcessGroupNCCLOpTest::test_sparse_allreduce_ops, test/distributed/test_c10d_ops_nccl.py::ProcessGroupNCCLOpTest::test_tensor_register_hook
2025-12-04T13:58:51.1746056Z 
2025-12-04T13:58:51.1746239Z Finished distributed/test_c10d_ops_nccl 1/1 ... [2025-12-04 13:58:51.172720][5234772.15175705], took 0.16min
2025-12-04T13:58:51.1755175Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T13:58:51.1766009Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T13:58:51.1767555Z Running distributed/elastic/events/lib_test 1/1 ... [2025-12-04 13:58:51.176646][5234772.155687806]
2025-12-04T13:58:51.1767799Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T13:58:51.1769517Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/elastic/events/lib_test.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:58:51.176847]
2025-12-04T13:58:53.3944505Z 
2025-12-04T13:58:53.3945342Z distributed/elastic/events/lib_test 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.elastic.events.lib_test_1.1_3edaaf21764b7565_.log
2025-12-04T13:58:53.3949352Z Running 8 items in this shard: test/distributed/elastic/events/lib_test.py::EventLibTest::test_event_created, test/distributed/elastic/events/lib_test.py::EventLibTest::test_event_deser, test/distributed/elastic/events/lib_test.py::EventLibTest::test_get_or_create_logger, test/distributed/elastic/events/lib_test.py::RdzvEventLibTest::test_construct_and_record_rdzv_event, test/distributed/elastic/events/lib_test.py::RdzvEventLibTest::test_construct_and_record_rdzv_event_does_not_run_if_invalid_dest, test/distributed/elastic/events/lib_test.py::RdzvEventLibTest::test_rdzv_event_created, test/distributed/elastic/events/lib_test.py::RdzvEventLibTest::test_rdzv_event_deserialize, test/distributed/elastic/events/lib_test.py::RdzvEventLibTest::test_rdzv_event_str
2025-12-04T13:58:53.3952375Z 
2025-12-04T13:58:53.3952640Z Finished distributed/elastic/events/lib_test 1/1 ... [2025-12-04 13:58:53.394178][5234774.373215253], took 0.04min
2025-12-04T13:58:53.3969920Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T13:58:53.3981677Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T13:58:53.3982219Z Running distributed/elastic/metrics/api_test 1/1 ... [2025-12-04 13:58:53.398108][5234774.377149679]
2025-12-04T13:58:53.3982566Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T13:58:53.3984542Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/elastic/metrics/api_test.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:58:53.398304]
2025-12-04T13:58:55.8167445Z 
2025-12-04T13:58:55.8168747Z distributed/elastic/metrics/api_test 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.elastic.metrics.api_test_1.1_d08f7f2dea080f69_.log
2025-12-04T13:58:55.8170891Z Running 3 items in this shard: test/distributed/elastic/metrics/api_test.py::MetricsApiTest::test_get_metric_name, test/distributed/elastic/metrics/api_test.py::MetricsApiTest::test_inheritance, test/distributed/elastic/metrics/api_test.py::MetricsApiTest::test_profile
2025-12-04T13:58:55.8172000Z 
2025-12-04T13:58:55.8172637Z Finished distributed/elastic/metrics/api_test 1/1 ... [2025-12-04 13:58:55.816353][5234776.795389625], took 0.04min
2025-12-04T13:58:55.8192721Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T13:58:55.8204376Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T13:58:55.8204717Z Running distributed/elastic/multiprocessing/api_test 1/1 ... [2025-12-04 13:58:55.820363][5234776.79940358]
2025-12-04T13:58:55.8204973Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T13:58:55.8207620Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/elastic/multiprocessing/api_test.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:58:55.820558]
2025-12-04T13:59:16.8696373Z 
2025-12-04T13:59:16.8697434Z distributed/elastic/multiprocessing/api_test 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.elastic.multiprocessing.api_test_1.1_293079c1c785d4d6_.log
2025-12-04T13:59:16.8707802Z Running 26 items in this shard: test/distributed/elastic/multiprocessing/api_test.py::RunProcResultsTest::test_get_failures, test/distributed/elastic/multiprocessing/api_test.py::RunProcResultsTest::test_is_failed, test/distributed/elastic/multiprocessing/api_test.py::StdTest::test_from_str_bad_input, test/distributed/elastic/multiprocessing/api_test.py::StdTest::test_from_value, test/distributed/elastic/multiprocessing/api_test.py::StdTest::test_from_value_map, test/distributed/elastic/multiprocessing/api_test.py::StartProcessesAsFuncTest::test_args_env_len_mismatch, test/distributed/elastic/multiprocessing/api_test.py::StartProcessesAsFuncTest::test_function_large_ret_val, test/distributed/elastic/multiprocessing/api_test.py::StartProcessesAsFuncTest::test_function_raise, test/distributed/elastic/multiprocessing/api_test.py::StartProcessesAsFuncTest::test_function_with_tensor, test/distributed/elastic/multiprocessing/api_test.py::StartProcessesAsFuncTest::test_invalid_log_dir, test/distributed/elastic/multiprocessing/api_test.py::StartProcessesAsFuncTest::test_multiprocess_context_close, test/distributed/elastic/multiprocessing/api_test.py::StartProcessesAsFuncTest::test_multiprocessing_context_poll_raises_exception, test/distributed/elastic/multiprocessing/api_test.py::StartProcessesAsFuncTest::test_pcontext_wait, test/distributed/elastic/multiprocessing/api_test.py::StartProcessesAsFuncTest::test_pcontext_wait_on_a_child_thread, test/distributed/elastic/multiprocessing/api_test.py::StartProcessesAsFuncTest::test_to_map, test/distributed/elastic/multiprocessing/api_test.py::StartProcessesAsFuncTest::test_void_function, test/distributed/elastic/multiprocessing/api_test.py::StartProcessesAsFuncTest::test_wait_for_all_child_procs_to_exit, test/distributed/elastic/multiprocessing/api_test.py::StartProcessesAsBinaryTest::test_binary_exit, test/distributed/elastic/multiprocessing/api_test.py::StartProcessesAsBinaryTest::test_binary_incorrect_entrypoint, test/distributed/elastic/multiprocessing/api_test.py::StartProcessesAsBinaryTest::test_binary_raises, test/distributed/elastic/multiprocessing/api_test.py::StartProcessesAsBinaryTest::test_subprocess_context_close, test/distributed/elastic/multiprocessing/api_test.py::StartProcessesAsBinaryTest::test_validate_full_rank, test/distributed/elastic/multiprocessing/api_test.py::StartProcessesListAsFuncTest::test_function, test/distributed/elastic/multiprocessing/api_test.py::StartProcessesListAsBinaryTest::test_binary, test/distributed/elastic/multiprocessing/api_test.py::StartProcessesListAsBinaryTest::test_binary_duplicate_log_filters, test/distributed/elastic/multiprocessing/api_test.py::StartProcessesListAsBinaryTest::test_binary_redirect_and_tee
2025-12-04T13:59:16.8715030Z 
2025-12-04T13:59:16.8715329Z Finished distributed/elastic/multiprocessing/api_test 1/1 ... [2025-12-04 13:59:16.869352][5234797.848389116], took 0.35min
2025-12-04T13:59:16.8721300Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T13:59:16.8730008Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T13:59:16.8732686Z Running distributed/elastic/timer/local_timer_example 1/1 ... [2025-12-04 13:59:16.873140][5234797.852181664]
2025-12-04T13:59:16.8732954Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T13:59:16.8735047Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/elastic/timer/local_timer_example.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:59:16.873337]
2025-12-04T13:59:27.4558026Z 
2025-12-04T13:59:27.4559237Z distributed/elastic/timer/local_timer_example 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.elastic.timer.local_timer_example_1.1_1a8baab1bb8d84b4_.log
2025-12-04T13:59:27.4561010Z Running 2 items in this shard: test/distributed/elastic/timer/local_timer_example.py::LocalTimerExample::test_example_start_method_spawn, test/distributed/elastic/timer/local_timer_example.py::LocalTimerExample::test_torch_mp_example
2025-12-04T13:59:27.4561871Z 
2025-12-04T13:59:27.4562236Z Finished distributed/elastic/timer/local_timer_example 1/1 ... [2025-12-04 13:59:27.455524][5234808.434560796], took 0.18min
2025-12-04T13:59:27.4585901Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T13:59:27.4596016Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T13:59:27.4598464Z Running distributed/elastic/timer/local_timer_test 1/1 ... [2025-12-04 13:59:27.459753][5234808.438794379]
2025-12-04T13:59:27.4598844Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T13:59:27.4600878Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/elastic/timer/local_timer_test.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:59:27.459965]
2025-12-04T13:59:33.4336025Z 
2025-12-04T13:59:33.4337779Z distributed/elastic/timer/local_timer_test 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.elastic.timer.local_timer_test_1.1_3f18f0368c8ef813_.log
2025-12-04T13:59:33.4343258Z Running 14 items in this shard: test/distributed/elastic/timer/local_timer_test.py::LocalTimerTest::test_client_interaction, test/distributed/elastic/timer/local_timer_test.py::LocalTimerTest::test_exception_propagation, test/distributed/elastic/timer/local_timer_test.py::LocalTimerTest::test_get_timer_recursive, test/distributed/elastic/timer/local_timer_test.py::LocalTimerTest::test_happy_path, test/distributed/elastic/timer/local_timer_test.py::LocalTimerTest::test_no_client, test/distributed/elastic/timer/local_timer_test.py::LocalTimerTest::test_timer, test/distributed/elastic/timer/local_timer_test.py::MultiprocessingRequestQueueTest::test_get, test/distributed/elastic/timer/local_timer_test.py::MultiprocessingRequestQueueTest::test_get_less_than_size, test/distributed/elastic/timer/local_timer_test.py::MultiprocessingRequestQueueTest::test_get_size, test/distributed/elastic/timer/local_timer_test.py::LocalTimerServerTest::test_acquire_release, test/distributed/elastic/timer/local_timer_test.py::LocalTimerServerTest::test_expired_timers, test/distributed/elastic/timer/local_timer_test.py::LocalTimerServerTest::test_valid_timers, test/distributed/elastic/timer/local_timer_test.py::LocalTimerServerTest::test_watchdog_call_count, test/distributed/elastic/timer/local_timer_test.py::LocalTimerServerTest::test_watchdog_empty_queue
2025-12-04T13:59:33.4347709Z 
2025-12-04T13:59:33.4348030Z Finished distributed/elastic/timer/local_timer_test 1/1 ... [2025-12-04 13:59:33.433203][5234814.412239534], took 0.10min
2025-12-04T13:59:33.4361804Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T13:59:33.4373747Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T13:59:33.4374268Z Running distributed/elastic/utils/distributed_test 1/1 ... [2025-12-04 13:59:33.437320][5234814.416361747]
2025-12-04T13:59:33.4374576Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T13:59:33.4376639Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/elastic/utils/distributed_test.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:59:33.437520]
2025-12-04T13:59:39.0098885Z 
2025-12-04T13:59:39.0100322Z distributed/elastic/utils/distributed_test 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.elastic.utils.distributed_test_1.1_01966c024defed56_.log
2025-12-04T13:59:39.0104728Z Running 8 items in this shard: test/distributed/elastic/utils/distributed_test.py::DistributedUtilTest::test_create_store_multi, test/distributed/elastic/utils/distributed_test.py::DistributedUtilTest::test_create_store_no_port_multi, test/distributed/elastic/utils/distributed_test.py::DistributedUtilTest::test_create_store_single_server, test/distributed/elastic/utils/distributed_test.py::DistributedUtilTest::test_create_store_timeout_on_server, test/distributed/elastic/utils/distributed_test.py::DistributedUtilTest::test_create_store_timeout_on_worker, test/distributed/elastic/utils/distributed_test.py::DistributedUtilTest::test_create_store_with_libuv_support, test/distributed/elastic/utils/distributed_test.py::DistributedUtilTest::test_port_already_in_use_on_server, test/distributed/elastic/utils/distributed_test.py::DistributedUtilTest::test_port_already_in_use_on_worker
2025-12-04T13:59:39.0108819Z 
2025-12-04T13:59:39.0109251Z Finished distributed/elastic/utils/distributed_test 1/1 ... [2025-12-04 13:59:39.009647][5234819.988682802], took 0.09min
2025-12-04T13:59:39.0125911Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T13:59:39.0136512Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T13:59:39.0138685Z Running distributed/elastic/utils/logging_test 1/1 ... [2025-12-04 13:59:39.013777][5234819.992818205]
2025-12-04T13:59:39.0139001Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T13:59:39.0140969Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/elastic/utils/logging_test.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:59:39.013972]
2025-12-04T13:59:41.2320099Z 
2025-12-04T13:59:41.2321094Z distributed/elastic/utils/logging_test 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.elastic.utils.logging_test_1.1_9edefec145f4d39d_.log
2025-12-04T13:59:41.2322445Z Running 2 items in this shard: test/distributed/elastic/utils/logging_test.py::LoggingTest::test_derive_module_name, test/distributed/elastic/utils/logging_test.py::LoggingTest::test_logger_name
2025-12-04T13:59:41.2323158Z 
2025-12-04T13:59:41.2323477Z Finished distributed/elastic/utils/logging_test 1/1 ... [2025-12-04 13:59:41.231685][5234822.210722943], took 0.04min
2025-12-04T13:59:41.2346910Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T13:59:41.2356165Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T13:59:41.2358451Z Running distributed/elastic/utils/util_test 1/1 ... [2025-12-04 13:59:41.235753][5234822.214794317]
2025-12-04T13:59:41.2358840Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set
2025-12-04T13:59:41.2362847Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/elastic/utils/util_test.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:59:41.235945]
2025-12-04T13:59:43.6040070Z 
2025-12-04T13:59:43.6040921Z distributed/elastic/utils/util_test 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.elastic.utils.util_test_1.1_718f6bc0a60f45bf_.log
2025-12-04T13:59:43.6045587Z Running 12 items in this shard: test/distributed/elastic/utils/util_test.py::StoreUtilTest::test_barrier, test/distributed/elastic/utils/util_test.py::StoreUtilTest::test_barrier_hash_store, test/distributed/elastic/utils/util_test.py::StoreUtilTest::test_barrier_timeout_operations, test/distributed/elastic/utils/util_test.py::StoreUtilTest::test_barrier_timeout_rank_tracing, test/distributed/elastic/utils/util_test.py::StoreUtilTest::test_get_all_rank_0, test/distributed/elastic/utils/util_test.py::StoreUtilTest::test_get_all_rank_n, test/distributed/elastic/utils/util_test.py::StoreUtilTest::test_synchronize, test/distributed/elastic/utils/util_test.py::StoreUtilTest::test_synchronize_hash_store, test/distributed/elastic/utils/util_test.py::UtilTest::test_get_logger, test/distributed/elastic/utils/util_test.py::UtilTest::test_get_logger_custom_name, test/distributed/elastic/utils/util_test.py::UtilTest::test_get_logger_different, test/distributed/elastic/utils/util_test.py::UtilTest::test_get_logger_none
2025-12-04T13:59:43.6056965Z 
2025-12-04T13:59:43.6057220Z Finished distributed/elastic/utils/util_test 1/1 ... [2025-12-04 13:59:43.603669][5234824.582707756], took 0.04min
2025-12-04T13:59:43.6060607Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/distributed.test_inductor_collectives/distributed.test_inductor_collectives-5592dd52db052605.xml
2025-12-04T13:59:43.6071788Z Failed to parse and upload json test reports: Unable to locate credentials
2025-12-04T13:59:45.7609554Z Running test batch 'tests to run' cost 12328.11 seconds
2025-12-04T13:59:45.7613512Z Emitting td_test_failure_stats_v2
2025-12-04T13:59:45.7617922Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764856785_7d73d336d11911f0aa330a4c12374f04
2025-12-04T13:59:47.7791004Z /var/lib/jenkins/pytorch/tools/stats/upload_metrics.py:156: UserWarning: Error uploading metric td_test_failure_stats_v2 to DynamoDB: Unable to locate credentials
2025-12-04T13:59:47.7791848Z   warn(f"Error uploading metric {metric_name} to DynamoDB: {e}")
2025-12-04T13:59:47.7792260Z Emitting td_test_failure_stats_v2
2025-12-04T13:59:47.7795720Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764856787_7ea7b196d11911f0aa330a4c12374f04
2025-12-04T13:59:47.7811746Z Emitting td_test_failure_stats_v2
2025-12-04T13:59:47.7812266Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764856787_7ea7f98ad11911f0aa330a4c12374f04
2025-12-04T13:59:47.7829318Z Emitting td_test_failure_stats_v2
2025-12-04T13:59:47.7829901Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764856787_7ea83fc6d11911f0aa330a4c12374f04
2025-12-04T13:59:47.7847202Z Emitting td_test_failure_stats_v2
2025-12-04T13:59:47.7848523Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764856787_7ea88602d11911f0aa330a4c12374f04
2025-12-04T13:59:47.7865030Z Emitting td_test_failure_stats_v2
2025-12-04T13:59:47.7865781Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764856787_7ea8c9f0d11911f0aa330a4c12374f04
2025-12-04T13:59:47.7883044Z Emitting td_test_failure_stats_v2
2025-12-04T13:59:47.7883557Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764856787_7ea91068d11911f0aa330a4c12374f04
2025-12-04T13:59:47.7900623Z Emitting td_test_failure_stats_v2
2025-12-04T13:59:47.7901026Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764856787_7ea956aed11911f0aa330a4c12374f04
2025-12-04T13:59:47.7917351Z distributed/fsdp/test_fsdp_overlap 1/1 failed!
2025-12-04T13:59:47.7917598Z distributed/fsdp/test_fsdp_exec_order 1/1 failed!
2025-12-04T13:59:47.7917813Z distributed/fsdp/test_fsdp_input 1/1 failed!
2025-12-04T13:59:47.7918017Z distributed/fsdp/test_fsdp_traversal 1/1 failed!
2025-12-04T13:59:47.7918226Z distributed/fsdp/test_fsdp_checkpoint 1/1 failed!
2025-12-04T13:59:47.7918433Z distributed/fsdp/test_fsdp_fine_tune 1/1 failed!
2025-12-04T13:59:47.7918649Z distributed/fsdp/test_hsdp_dtensor_state_dict 1/1 failed!
2025-12-04T13:59:47.7918863Z distributed/fsdp/test_fsdp_core 1/2 failed!
2025-12-04T13:59:48.5409329Z 
2025-12-04T13:59:48.5409812Z real	205m33.994s
2025-12-04T13:59:48.5410140Z user	1036m3.585s
2025-12-04T13:59:48.5410384Z sys	423m46.480s
2025-12-04T13:59:48.5410635Z + sccache_epilogue
2025-12-04T13:59:48.5410961Z + echo '::group::Sccache Compilation Log'
2025-12-04T13:59:48.5411712Z ##[group]Sccache Compilation Log
2025-12-04T13:59:48.5412111Z + echo '=================== sccache compilation log ==================='
2025-12-04T13:59:48.5413017Z =================== sccache compilation log ===================
2025-12-04T13:59:48.5413649Z + python /var/lib/jenkins/pytorch/.ci/pytorch/print_sccache_log.py /var/lib/jenkins/sccache_error.log
2025-12-04T13:59:48.5487949Z + echo '=========== If your build fails, please take a look at the log above for possible reasons ==========='
2025-12-04T13:59:48.5488444Z =========== If your build fails, please take a look at the log above for possible reasons ===========
2025-12-04T13:59:48.5488790Z + sccache --show-stats
2025-12-04T13:59:48.5512685Z Compile requests                    339
2025-12-04T13:59:48.5512943Z Compile requests executed             0
2025-12-04T13:59:48.5513155Z Cache hits                            0
2025-12-04T13:59:48.5513354Z Cache misses                          0
2025-12-04T13:59:48.5513558Z Cache hits rate                       -
2025-12-04T13:59:48.5513764Z Cache timeouts                        0
2025-12-04T13:59:48.5513967Z Cache read errors                     0
2025-12-04T13:59:48.5514165Z Forced recaches                       0
2025-12-04T13:59:48.5514480Z Cache write errors                    0
2025-12-04T13:59:48.5514679Z Cache errors                          0
2025-12-04T13:59:48.5514879Z Compilations                          0
2025-12-04T13:59:48.5515087Z Compilation failures                  0
2025-12-04T13:59:48.5515306Z Non-cacheable compilations            0
2025-12-04T13:59:48.5515512Z Non-cacheable calls                   0
2025-12-04T13:59:48.5515719Z Non-compilation calls               339
2025-12-04T13:59:48.5515937Z Unsupported compiler calls            0
2025-12-04T13:59:48.5516156Z Average cache write               0.000 s
2025-12-04T13:59:48.5516375Z Average compiler                  0.000 s
2025-12-04T13:59:48.5516593Z Average cache read hit            0.000 s
2025-12-04T13:59:48.5516811Z Failed distributed compilations       0
2025-12-04T13:59:48.5517082Z Cache location                  Local disk: "/var/lib/jenkins/.cache/sccache"
2025-12-04T13:59:48.5517373Z Use direct/preprocessor mode?   yes
2025-12-04T13:59:48.5517592Z Version (client)                0.10.0
2025-12-04T13:59:48.5517805Z Max cache size                       10 GiB
2025-12-04T13:59:48.5518032Z + sccache --stop-server
2025-12-04T13:59:48.5536247Z Stopping sccache server...
2025-12-04T13:59:48.5539316Z Compile requests                    339
2025-12-04T13:59:48.5539627Z Compile requests executed             0
2025-12-04T13:59:48.5539809Z Cache hits                            0
2025-12-04T13:59:48.5539989Z Cache misses                          0
2025-12-04T13:59:48.5540253Z Cache hits rate                       -
2025-12-04T13:59:48.5540421Z Cache timeouts                        0
2025-12-04T13:59:48.5540580Z Cache read errors                     0
2025-12-04T13:59:48.5540744Z Forced recaches                       0
2025-12-04T13:59:48.5540908Z Cache write errors                    0
2025-12-04T13:59:48.5541074Z Cache errors                          0
2025-12-04T13:59:48.5541242Z Compilations                          0
2025-12-04T13:59:48.5541410Z Compilation failures                  0
2025-12-04T13:59:48.5541590Z Non-cacheable compilations            0
2025-12-04T13:59:48.5541762Z Non-cacheable calls                   0
2025-12-04T13:59:48.5541928Z Non-compilation calls               339
2025-12-04T13:59:48.5542107Z Unsupported compiler calls            0
2025-12-04T13:59:48.5542283Z Average cache write               0.000 s
2025-12-04T13:59:48.5542472Z Average compiler                  0.000 s
2025-12-04T13:59:48.5542648Z Average cache read hit            0.000 s
2025-12-04T13:59:48.5542835Z Failed distributed compilations       0
2025-12-04T13:59:48.5543058Z Cache location                  Local disk: "/var/lib/jenkins/.cache/sccache"
2025-12-04T13:59:48.5543297Z Use direct/preprocessor mode?   yes
2025-12-04T13:59:48.5543471Z Version (client)                0.10.0
2025-12-04T13:59:48.5543652Z Max cache size                       10 GiB
2025-12-04T13:59:48.5543864Z + echo ::endgroup::
2025-12-04T13:59:48.5544118Z ##[endgroup]
2025-12-04T13:59:48.5614280Z ##[error]Process completed with exit code 1.
2025-12-04T13:59:48.5640708Z ##[group]Run # copy test results back to the mounted workspace, needed sudo, resulting permissions were correct
2025-12-04T13:59:48.5641019Z [36;1m# copy test results back to the mounted workspace, needed sudo, resulting permissions were correct[0m
2025-12-04T13:59:48.5641390Z [36;1mdocker exec -t "8ac2e1cca5c5a27e1632f345b696937e28036af305fa8c40ccee0a9f11fed68c" sh -c "cd ../pytorch && sudo cp -R test/test-reports ../workspace/test"[0m
2025-12-04T13:59:48.5645562Z shell: /usr/bin/bash -e {0}
2025-12-04T13:59:48.5645675Z env:
2025-12-04T13:59:48.5645768Z   GIT_DEFAULT_BRANCH: main
2025-12-04T13:59:48.5645903Z   RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts
2025-12-04T13:59:48.5646081Z   RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results
2025-12-04T13:59:48.5646245Z   RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs
2025-12-04T13:59:48.5646767Z   GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host
2025-12-04T13:59:48.5647255Z   AWS_DEFAULT_REGION: us-east-1
2025-12-04T13:59:48.5647370Z   AWS_REGION: us-east-1
2025-12-04T13:59:48.5647524Z   AWS_ACCESS_KEY_ID: ***
2025-12-04T13:59:48.5647676Z   AWS_SECRET_ACCESS_KEY: ***
2025-12-04T13:59:48.5649664Z   AWS_SESSION_TOKEN: ***
2025-12-04T13:59:48.5649866Z   CONTAINER_NAME: 8ac2e1cca5c5a27e1632f345b696937e28036af305fa8c40ccee0a9f11fed68c
2025-12-04T13:59:48.5650060Z ##[endgroup]
2025-12-04T13:59:48.6362141Z ##[group]Run docker exec -t "8ac2e1cca5c5a27e1632f345b696937e28036af305fa8c40ccee0a9f11fed68c" sh -c "sudo chown -R 1001:1001 test"
2025-12-04T13:59:48.6362531Z [36;1mdocker exec -t "8ac2e1cca5c5a27e1632f345b696937e28036af305fa8c40ccee0a9f11fed68c" sh -c "sudo chown -R 1001:1001 test"[0m
2025-12-04T13:59:48.6366811Z shell: /usr/bin/bash -e {0}
2025-12-04T13:59:48.6366922Z env:
2025-12-04T13:59:48.6367016Z   GIT_DEFAULT_BRANCH: main
2025-12-04T13:59:48.6367151Z   RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts
2025-12-04T13:59:48.6367323Z   RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results
2025-12-04T13:59:48.6367486Z   RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs
2025-12-04T13:59:48.6367998Z   GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host
2025-12-04T13:59:48.6368557Z   AWS_DEFAULT_REGION: us-east-1
2025-12-04T13:59:48.6368671Z   AWS_REGION: us-east-1
2025-12-04T13:59:48.6368829Z   AWS_ACCESS_KEY_ID: ***
2025-12-04T13:59:48.6368984Z   AWS_SECRET_ACCESS_KEY: ***
2025-12-04T13:59:48.6371025Z   AWS_SESSION_TOKEN: ***
2025-12-04T13:59:48.6371192Z   CONTAINER_NAME: 8ac2e1cca5c5a27e1632f345b696937e28036af305fa8c40ccee0a9f11fed68c
2025-12-04T13:59:48.6371371Z ##[endgroup]
2025-12-04T13:59:48.7167391Z ##[group]Run cat test/**/*_toprint.log || true
2025-12-04T13:59:48.7167552Z [36;1mcat test/**/*_toprint.log || true[0m
2025-12-04T13:59:48.7171670Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T13:59:48.7171816Z env:
2025-12-04T13:59:48.7171910Z   GIT_DEFAULT_BRANCH: main
2025-12-04T13:59:48.7172051Z   RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts
2025-12-04T13:59:48.7172224Z   RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results
2025-12-04T13:59:48.7172389Z   RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs
2025-12-04T13:59:48.7172893Z   GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host
2025-12-04T13:59:48.7173433Z   AWS_DEFAULT_REGION: us-east-1
2025-12-04T13:59:48.7173559Z   AWS_REGION: us-east-1
2025-12-04T13:59:48.7173715Z   AWS_ACCESS_KEY_ID: ***
2025-12-04T13:59:48.7173864Z   AWS_SECRET_ACCESS_KEY: ***
2025-12-04T13:59:48.7175830Z   AWS_SESSION_TOKEN: ***
2025-12-04T13:59:48.7175995Z   CONTAINER_NAME: 8ac2e1cca5c5a27e1632f345b696937e28036af305fa8c40ccee0a9f11fed68c
2025-12-04T13:59:48.7176175Z ##[endgroup]
2025-12-04T13:59:48.7220921Z cat: 'test/**/*_toprint.log': No such file or directory
2025-12-04T13:59:48.7283043Z Prepare all required actions
2025-12-04T13:59:48.7283434Z Getting action download info
2025-12-04T13:59:49.0496881Z Download action repository 'seemethere/upload-artifact-s3@v5' (SHA:baba72d0712b404f646cebe0730933554ebce96a)
2025-12-04T13:59:49.9118064Z Download action repository 'actions/upload-artifact@v4' (SHA:ea165f8d65b6e75b540449e92b4886f43607fa02)
2025-12-04T13:59:50.8288005Z ##[group]Run ./.github/actions/upload-test-artifacts
2025-12-04T13:59:50.8288159Z with:
2025-12-04T13:59:50.8288249Z   use-gha: true
2025-12-04T13:59:50.8288415Z   file-suffix: test-distributed-2-3-linux.rocm.gpu.gfx942.4.b_57116213187
2025-12-04T13:59:50.8288602Z   s3-bucket: gha-artifacts
2025-12-04T13:59:50.8288711Z env:
2025-12-04T13:59:50.8288803Z   GIT_DEFAULT_BRANCH: main
2025-12-04T13:59:50.8288937Z   RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts
2025-12-04T13:59:50.8289116Z   RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results
2025-12-04T13:59:50.8289319Z   RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs
2025-12-04T13:59:50.8289888Z   GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host
2025-12-04T13:59:50.8290389Z   AWS_DEFAULT_REGION: us-east-1
2025-12-04T13:59:50.8290511Z   AWS_REGION: us-east-1
2025-12-04T13:59:50.8290679Z   AWS_ACCESS_KEY_ID: ***
2025-12-04T13:59:50.8290834Z   AWS_SECRET_ACCESS_KEY: ***
2025-12-04T13:59:50.8292807Z   AWS_SESSION_TOKEN: ***
2025-12-04T13:59:50.8292982Z   CONTAINER_NAME: 8ac2e1cca5c5a27e1632f345b696937e28036af305fa8c40ccee0a9f11fed68c
2025-12-04T13:59:50.8293165Z ##[endgroup]
2025-12-04T13:59:50.8325831Z ##[group]Run actions/upload-artifact@v4
2025-12-04T13:59:50.8325971Z with:
2025-12-04T13:59:50.8326271Z   name: test-jsons-runattempt1-test-distributed-2-3-linux.rocm.gpu.gfx942.4.b_57116213187.zip
2025-12-04T13:59:50.8326486Z   retention-days: 14
2025-12-04T13:59:50.8326593Z   if-no-files-found: warn
2025-12-04T13:59:50.8326701Z   path: test/**/*.json
2025-12-04T13:59:50.8326804Z   compression-level: 6
2025-12-04T13:59:50.8326905Z   overwrite: false
2025-12-04T13:59:50.8327007Z   include-hidden-files: false
2025-12-04T13:59:50.8327117Z env:
2025-12-04T13:59:50.8327208Z   GIT_DEFAULT_BRANCH: main
2025-12-04T13:59:50.8327341Z   RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts
2025-12-04T13:59:50.8327514Z   RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results
2025-12-04T13:59:50.8327676Z   RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs
2025-12-04T13:59:50.8328185Z   GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host
2025-12-04T13:59:50.8328674Z   AWS_DEFAULT_REGION: us-east-1
2025-12-04T13:59:50.8328793Z   AWS_REGION: us-east-1
2025-12-04T13:59:50.8328944Z   AWS_ACCESS_KEY_ID: ***
2025-12-04T13:59:50.8329098Z   AWS_SECRET_ACCESS_KEY: ***
2025-12-04T13:59:50.8331142Z   AWS_SESSION_TOKEN: ***
2025-12-04T13:59:50.8331314Z   CONTAINER_NAME: 8ac2e1cca5c5a27e1632f345b696937e28036af305fa8c40ccee0a9f11fed68c
2025-12-04T13:59:50.8331564Z ##[endgroup]
2025-12-04T13:59:51.2088861Z With the provided path, there will be 6 files uploaded
2025-12-04T13:59:51.2092246Z Artifact name is valid!
2025-12-04T13:59:51.2093290Z Root directory input is valid!
2025-12-04T13:59:51.4270538Z Beginning upload of artifact content to blob storage
2025-12-04T13:59:51.8176285Z Uploaded bytes 44615
2025-12-04T13:59:51.8897791Z Finished uploading artifact content to blob storage!
2025-12-04T13:59:51.8898965Z SHA256 digest of uploaded artifact zip is 029366cfb8163f844ae937e8b0a3b01a795d0e959338e1e1fb83bf5154f763ec
2025-12-04T13:59:51.8900179Z Finalizing artifact upload
2025-12-04T13:59:52.1027253Z Artifact test-jsons-runattempt1-test-distributed-2-3-linux.rocm.gpu.gfx942.4.b_57116213187.zip.zip successfully finalized. Artifact ID 4764837803
2025-12-04T13:59:52.1028677Z Artifact test-jsons-runattempt1-test-distributed-2-3-linux.rocm.gpu.gfx942.4.b_57116213187.zip has been successfully uploaded! Final size is 44615 bytes. Artifact ID is 4764837803
2025-12-04T13:59:52.1033096Z Artifact download URL: https://github.com/pytorch/pytorch/actions/runs/19922849170/artifacts/4764837803
2025-12-04T13:59:52.1165184Z ##[group]Run actions/upload-artifact@v4
2025-12-04T13:59:52.1165326Z with:
2025-12-04T13:59:52.1165533Z   name: test-reports-runattempt1-test-distributed-2-3-linux.rocm.gpu.gfx942.4.b_57116213187.zip
2025-12-04T13:59:52.1165765Z   retention-days: 14
2025-12-04T13:59:52.1165877Z   if-no-files-found: ignore
2025-12-04T13:59:52.1166002Z   path: test/**/*.xml
test/**/*.csv

2025-12-04T13:59:52.1166139Z   compression-level: 6
2025-12-04T13:59:52.1166246Z   overwrite: false
2025-12-04T13:59:52.1166351Z   include-hidden-files: false
2025-12-04T13:59:52.1166468Z env:
2025-12-04T13:59:52.1166560Z   GIT_DEFAULT_BRANCH: main
2025-12-04T13:59:52.1166705Z   RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts
2025-12-04T13:59:52.1166889Z   RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results
2025-12-04T13:59:52.1167069Z   RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs
2025-12-04T13:59:52.1167584Z   GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host
2025-12-04T13:59:52.1168086Z   AWS_DEFAULT_REGION: us-east-1
2025-12-04T13:59:52.1168207Z   AWS_REGION: us-east-1
2025-12-04T13:59:52.1168367Z   AWS_ACCESS_KEY_ID: ***
2025-12-04T13:59:52.1168602Z   AWS_SECRET_ACCESS_KEY: ***
2025-12-04T13:59:52.1170664Z   AWS_SESSION_TOKEN: ***
2025-12-04T13:59:52.1170841Z   CONTAINER_NAME: 8ac2e1cca5c5a27e1632f345b696937e28036af305fa8c40ccee0a9f11fed68c
2025-12-04T13:59:52.1171027Z ##[endgroup]
2025-12-04T13:59:52.5433970Z With the provided path, there will be 859 files uploaded
2025-12-04T13:59:52.5436665Z Artifact name is valid!
2025-12-04T13:59:52.5437405Z Root directory input is valid!
2025-12-04T13:59:52.7705680Z Beginning upload of artifact content to blob storage
2025-12-04T13:59:53.4919768Z Uploaded bytes 712772
2025-12-04T13:59:53.5599758Z Finished uploading artifact content to blob storage!
2025-12-04T13:59:53.5601027Z SHA256 digest of uploaded artifact zip is f5d9a4191bc68805afcc704a5c344589ccdcd6c6d18f2c02d81b5874280de6f7
2025-12-04T13:59:53.5601670Z Finalizing artifact upload
2025-12-04T13:59:54.0288743Z Artifact test-reports-runattempt1-test-distributed-2-3-linux.rocm.gpu.gfx942.4.b_57116213187.zip.zip successfully finalized. Artifact ID 4764838140
2025-12-04T13:59:54.0290321Z Artifact test-reports-runattempt1-test-distributed-2-3-linux.rocm.gpu.gfx942.4.b_57116213187.zip has been successfully uploaded! Final size is 712772 bytes. Artifact ID is 4764838140
2025-12-04T13:59:54.0295959Z Artifact download URL: https://github.com/pytorch/pytorch/actions/runs/19922849170/artifacts/4764838140
2025-12-04T13:59:54.0439861Z ##[group]Run actions/upload-artifact@v4
2025-12-04T13:59:54.0440130Z with:
2025-12-04T13:59:54.0440368Z   name: logs-runattempt1-test-distributed-2-3-linux.rocm.gpu.gfx942.4.b_57116213187.zip
2025-12-04T13:59:54.0440638Z   retention-days: 14
2025-12-04T13:59:54.0440783Z   if-no-files-found: ignore
2025-12-04T13:59:54.0440940Z   path: usage_log.txt
test/**/*.log

2025-12-04T13:59:54.0441107Z   compression-level: 6
2025-12-04T13:59:54.0441243Z   overwrite: false
2025-12-04T13:59:54.0441384Z   include-hidden-files: false
2025-12-04T13:59:54.0441530Z env:
2025-12-04T13:59:54.0441648Z   GIT_DEFAULT_BRANCH: main
2025-12-04T13:59:54.0441835Z   RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts
2025-12-04T13:59:54.0449504Z   RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results
2025-12-04T13:59:54.0449961Z   RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs
2025-12-04T13:59:54.0450499Z   GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host
2025-12-04T13:59:54.0451089Z   AWS_DEFAULT_REGION: us-east-1
2025-12-04T13:59:54.0451213Z   AWS_REGION: us-east-1
2025-12-04T13:59:54.0451382Z   AWS_ACCESS_KEY_ID: ***
2025-12-04T13:59:54.0451552Z   AWS_SECRET_ACCESS_KEY: ***
2025-12-04T13:59:54.0453593Z   AWS_SESSION_TOKEN: ***
2025-12-04T13:59:54.0453775Z   CONTAINER_NAME: 8ac2e1cca5c5a27e1632f345b696937e28036af305fa8c40ccee0a9f11fed68c
2025-12-04T13:59:54.0453965Z ##[endgroup]
2025-12-04T13:59:54.4438573Z Multiple search paths detected. Calculating the least common ancestor of all paths
2025-12-04T13:59:54.4439512Z The least common ancestor is /home/runner/_work/pytorch/pytorch. This will be the root directory of the artifact
2025-12-04T13:59:54.4440092Z With the provided path, there will be 84 files uploaded
2025-12-04T13:59:54.4443002Z Artifact name is valid!
2025-12-04T13:59:54.4443567Z Root directory input is valid!
2025-12-04T13:59:54.6704991Z Beginning upload of artifact content to blob storage
2025-12-04T13:59:55.3022576Z Uploaded bytes 800413
2025-12-04T13:59:55.3685064Z Finished uploading artifact content to blob storage!
2025-12-04T13:59:55.3686259Z SHA256 digest of uploaded artifact zip is b50b3de5526cf4546d5fc4e5d0d4023cc70603fbf368d13ce20e8b9aa165453d
2025-12-04T13:59:55.3687061Z Finalizing artifact upload
2025-12-04T13:59:55.5133067Z Artifact logs-runattempt1-test-distributed-2-3-linux.rocm.gpu.gfx942.4.b_57116213187.zip.zip successfully finalized. Artifact ID 4764838526
2025-12-04T13:59:55.5134123Z Artifact logs-runattempt1-test-distributed-2-3-linux.rocm.gpu.gfx942.4.b_57116213187.zip has been successfully uploaded! Final size is 800413 bytes. Artifact ID is 4764838526
2025-12-04T13:59:55.5138549Z Artifact download URL: https://github.com/pytorch/pytorch/actions/runs/19922849170/artifacts/4764838526
2025-12-04T13:59:55.5264083Z ##[group]Run # shellcheck disable=SC2156
2025-12-04T13:59:55.5264313Z [36;1m# shellcheck disable=SC2156[0m
2025-12-04T13:59:55.5264600Z [36;1mfind . -iname "core.[1-9]*" -exec docker exec "${CONTAINER_NAME}" sh -c "gdb python {} -ex 'bt' -ex 'q'" \;[0m
2025-12-04T13:59:55.5269461Z shell: /usr/bin/bash -e {0}
2025-12-04T13:59:55.5269820Z env:
2025-12-04T13:59:55.5269930Z   GIT_DEFAULT_BRANCH: main
2025-12-04T13:59:55.5270083Z   RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts
2025-12-04T13:59:55.5270273Z   RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results
2025-12-04T13:59:55.5270451Z   RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs
2025-12-04T13:59:55.5271013Z   GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host
2025-12-04T13:59:55.5271517Z   AWS_DEFAULT_REGION: us-east-1
2025-12-04T13:59:55.5271714Z   AWS_REGION: us-east-1
2025-12-04T13:59:55.5271886Z   AWS_ACCESS_KEY_ID: ***
2025-12-04T13:59:55.5272057Z   AWS_SECRET_ACCESS_KEY: ***
2025-12-04T13:59:55.5274059Z   AWS_SESSION_TOKEN: ***
2025-12-04T13:59:55.5274243Z   CONTAINER_NAME: 8ac2e1cca5c5a27e1632f345b696937e28036af305fa8c40ccee0a9f11fed68c
2025-12-04T13:59:55.5274436Z ##[endgroup]
2025-12-04T13:59:55.6561805Z ##[group]Run actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02
2025-12-04T13:59:55.6562031Z with:
2025-12-04T13:59:55.6562188Z   name: coredumps-distributed-2-3-linux.rocm.gpu.gfx942.4.b
2025-12-04T13:59:55.6562386Z   retention-days: 14
2025-12-04T13:59:55.6562509Z   if-no-files-found: ignore
2025-12-04T13:59:55.6562638Z   path: ./**/core.[1-9]*
2025-12-04T13:59:55.6562764Z   compression-level: 6
2025-12-04T13:59:55.6562887Z   overwrite: false
2025-12-04T13:59:55.6563010Z   include-hidden-files: false
2025-12-04T13:59:55.6563142Z env:
2025-12-04T13:59:55.6563245Z   GIT_DEFAULT_BRANCH: main
2025-12-04T13:59:55.6563413Z   RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts
2025-12-04T13:59:55.6563616Z   RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results
2025-12-04T13:59:55.6563806Z   RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs
2025-12-04T13:59:55.6564405Z   GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD128 --device /dev/dri/renderD136 --device /dev/dri/renderD144 --device /dev/dri/renderD152 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host
2025-12-04T13:59:55.6564919Z   AWS_DEFAULT_REGION: us-east-1
2025-12-04T13:59:55.6565045Z   AWS_REGION: us-east-1
2025-12-04T13:59:55.6565237Z   AWS_ACCESS_KEY_ID: ***
2025-12-04T13:59:55.6565400Z   AWS_SECRET_ACCESS_KEY: ***
2025-12-04T13:59:55.6567387Z   AWS_SESSION_TOKEN: ***
2025-12-04T13:59:55.6567570Z   CONTAINER_NAME: 8ac2e1cca5c5a27e1632f345b696937e28036af305fa8c40ccee0a9f11fed68c
2025-12-04T13:59:55.6567761Z ##[endgroup]
2025-12-04T13:59:59.5985251Z No files were found with the provided path: ./**/core.[1-9]*. No artifacts will be uploaded.
2025-12-04T13:59:59.6150129Z Post job cleanup.
2025-12-04T13:59:59.6162629Z Post job cleanup.
2025-12-04T13:59:59.6365267Z Logging out of registry 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T13:59:59.6578813Z Post job cleanup.
2025-12-04T13:59:59.7209075Z Post job cleanup.
2025-12-04T13:59:59.7228715Z Post job cleanup.
2025-12-04T13:59:59.7696723Z [command]/usr/bin/git version
2025-12-04T13:59:59.7721749Z git version 2.52.0
2025-12-04T13:59:59.7741400Z Copying '/home/runner/.gitconfig' to '/home/runner/_work/_temp/8fbd6a03-a64a-4207-be18-c4cca24ee4fc/.gitconfig'
2025-12-04T13:59:59.7746855Z Temporarily overriding HOME='/home/runner/_work/_temp/8fbd6a03-a64a-4207-be18-c4cca24ee4fc' before making global git config changes
2025-12-04T13:59:59.7747382Z Adding repository directory to the temporary git global config as a safe directory
2025-12-04T13:59:59.7749467Z [command]/usr/bin/git config --global --add safe.directory /home/runner/_work/pytorch/pytorch
2025-12-04T13:59:59.7775932Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand
2025-12-04T13:59:59.7809766Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :"
2025-12-04T13:59:59.8005598Z Entering 'android/libs/fbjni'
2025-12-04T13:59:59.8031238Z Entering 'third_party/FP16'
2025-12-04T13:59:59.8056150Z Entering 'third_party/FXdiv'
2025-12-04T13:59:59.8079889Z Entering 'third_party/NNPACK'
2025-12-04T13:59:59.8106054Z Entering 'third_party/NVTX'
2025-12-04T13:59:59.8132378Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T13:59:59.8157656Z Entering 'third_party/XNNPACK'
2025-12-04T13:59:59.8193215Z Entering 'third_party/aiter'
2025-12-04T13:59:59.8227428Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T13:59:59.8260625Z Entering 'third_party/benchmark'
2025-12-04T13:59:59.8285979Z Entering 'third_party/composable_kernel'
2025-12-04T13:59:59.8315482Z Entering 'third_party/cpp-httplib'
2025-12-04T13:59:59.8339127Z Entering 'third_party/cpuinfo'
2025-12-04T13:59:59.8363053Z Entering 'third_party/cudnn_frontend'
2025-12-04T13:59:59.8386750Z Entering 'third_party/cutlass'
2025-12-04T13:59:59.8414976Z Entering 'third_party/fbgemm'
2025-12-04T13:59:59.8440668Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T13:59:59.8465175Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T13:59:59.8489840Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T13:59:59.8515039Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T13:59:59.8541667Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T13:59:59.8564353Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T13:59:59.8590582Z Entering 'third_party/fbgemm/external/json'
2025-12-04T13:59:59.8615532Z Entering 'third_party/flash-attention'
2025-12-04T13:59:59.8641024Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T13:59:59.8666718Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T13:59:59.8694611Z Entering 'third_party/flatbuffers'
2025-12-04T13:59:59.8718235Z Entering 'third_party/fmt'
2025-12-04T13:59:59.8749848Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T13:59:59.8774562Z Entering 'third_party/gloo'
2025-12-04T13:59:59.8798083Z Entering 'third_party/googletest'
2025-12-04T13:59:59.8821021Z Entering 'third_party/ideep'
2025-12-04T13:59:59.8848939Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T13:59:59.8886210Z Entering 'third_party/ittapi'
2025-12-04T13:59:59.8908576Z Entering 'third_party/kineto'
2025-12-04T13:59:59.8940942Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T13:59:59.8964546Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T13:59:59.8989125Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T13:59:59.9013652Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T13:59:59.9043544Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T13:59:59.9066955Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T13:59:59.9095825Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T13:59:59.9121779Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T13:59:59.9154117Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T13:59:59.9180367Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T13:59:59.9209056Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T13:59:59.9237564Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T13:59:59.9263473Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T13:59:59.9291172Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T13:59:59.9315686Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T13:59:59.9341892Z Entering 'third_party/kleidiai'
2025-12-04T13:59:59.9369334Z Entering 'third_party/mimalloc'
2025-12-04T13:59:59.9393346Z Entering 'third_party/nlohmann'
2025-12-04T13:59:59.9416671Z Entering 'third_party/onnx'
2025-12-04T13:59:59.9447395Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T13:59:59.9478034Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T13:59:59.9503670Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T13:59:59.9549832Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T13:59:59.9586420Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T13:59:59.9623588Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T13:59:59.9656375Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T13:59:59.9682132Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T13:59:59.9703412Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T13:59:59.9724363Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T13:59:59.9753397Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T13:59:59.9787804Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T13:59:59.9825814Z Entering 'third_party/pocketfft'
2025-12-04T13:59:59.9853858Z Entering 'third_party/protobuf'
2025-12-04T13:59:59.9883390Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T13:59:59.9904720Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T13:59:59.9930901Z Entering 'third_party/psimd'
2025-12-04T13:59:59.9960621Z Entering 'third_party/pthreadpool'
2025-12-04T13:59:59.9994579Z Entering 'third_party/pybind11'
2025-12-04T14:00:00.0020837Z Entering 'third_party/python-peachpy'
2025-12-04T14:00:00.0049882Z Entering 'third_party/sleef'
2025-12-04T14:00:00.0074861Z Entering 'third_party/tensorpipe'
2025-12-04T14:00:00.0101938Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T14:00:00.0130446Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T14:00:00.0159179Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T14:00:00.0183550Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T14:00:00.0208661Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T14:00:00.0254423Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader
2025-12-04T14:00:00.0270573Z http.https://github.com/.extraheader
2025-12-04T14:00:00.0280317Z [command]/usr/bin/git config --local --unset-all http.https://github.com/.extraheader
2025-12-04T14:00:00.0306815Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :"
2025-12-04T14:00:00.0518885Z Entering 'android/libs/fbjni'
2025-12-04T14:00:00.0547773Z http.https://github.com/.extraheader
2025-12-04T14:00:00.0572638Z Entering 'third_party/FP16'
2025-12-04T14:00:00.0587012Z http.https://github.com/.extraheader
2025-12-04T14:00:00.0608465Z Entering 'third_party/FXdiv'
2025-12-04T14:00:00.0625486Z http.https://github.com/.extraheader
2025-12-04T14:00:00.0644867Z Entering 'third_party/NNPACK'
2025-12-04T14:00:00.0663958Z http.https://github.com/.extraheader
2025-12-04T14:00:00.0683448Z Entering 'third_party/NVTX'
2025-12-04T14:00:00.0696515Z http.https://github.com/.extraheader
2025-12-04T14:00:00.0716717Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T14:00:00.0730075Z http.https://github.com/.extraheader
2025-12-04T14:00:00.0749191Z Entering 'third_party/XNNPACK'
2025-12-04T14:00:00.0765614Z http.https://github.com/.extraheader
2025-12-04T14:00:00.0789370Z Entering 'third_party/aiter'
2025-12-04T14:00:00.0803857Z http.https://github.com/.extraheader
2025-12-04T14:00:00.0821354Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T14:00:00.0839085Z http.https://github.com/.extraheader
2025-12-04T14:00:00.0861724Z Entering 'third_party/benchmark'
2025-12-04T14:00:00.0876148Z http.https://github.com/.extraheader
2025-12-04T14:00:00.0896323Z Entering 'third_party/composable_kernel'
2025-12-04T14:00:00.0908678Z http.https://github.com/.extraheader
2025-12-04T14:00:00.0931883Z Entering 'third_party/cpp-httplib'
2025-12-04T14:00:00.0945987Z http.https://github.com/.extraheader
2025-12-04T14:00:00.0963726Z Entering 'third_party/cpuinfo'
2025-12-04T14:00:00.0983445Z http.https://github.com/.extraheader
2025-12-04T14:00:00.1004154Z Entering 'third_party/cudnn_frontend'
2025-12-04T14:00:00.1018310Z http.https://github.com/.extraheader
2025-12-04T14:00:00.1037031Z Entering 'third_party/cutlass'
2025-12-04T14:00:00.1052160Z http.https://github.com/.extraheader
2025-12-04T14:00:00.1075550Z Entering 'third_party/fbgemm'
2025-12-04T14:00:00.1090244Z http.https://github.com/.extraheader
2025-12-04T14:00:00.1114302Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T14:00:00.1129366Z http.https://github.com/.extraheader
2025-12-04T14:00:00.1158417Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T14:00:00.1176459Z http.https://github.com/.extraheader
2025-12-04T14:00:00.1196476Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T14:00:00.1209486Z http.https://github.com/.extraheader
2025-12-04T14:00:00.1226149Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T14:00:00.1245728Z http.https://github.com/.extraheader
2025-12-04T14:00:00.1267491Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T14:00:00.1279642Z http.https://github.com/.extraheader
2025-12-04T14:00:00.1296329Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T14:00:00.1308530Z http.https://github.com/.extraheader
2025-12-04T14:00:00.1326519Z Entering 'third_party/fbgemm/external/json'
2025-12-04T14:00:00.1340544Z http.https://github.com/.extraheader
2025-12-04T14:00:00.1360325Z Entering 'third_party/flash-attention'
2025-12-04T14:00:00.1373289Z http.https://github.com/.extraheader
2025-12-04T14:00:00.1394922Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T14:00:00.1412321Z http.https://github.com/.extraheader
2025-12-04T14:00:00.1431469Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T14:00:00.1446932Z http.https://github.com/.extraheader
2025-12-04T14:00:00.1471540Z Entering 'third_party/flatbuffers'
2025-12-04T14:00:00.1485468Z http.https://github.com/.extraheader
2025-12-04T14:00:00.1515010Z Entering 'third_party/fmt'
2025-12-04T14:00:00.1532655Z http.https://github.com/.extraheader
2025-12-04T14:00:00.1561487Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T14:00:00.1587525Z http.https://github.com/.extraheader
2025-12-04T14:00:00.1609398Z Entering 'third_party/gloo'
2025-12-04T14:00:00.1624277Z http.https://github.com/.extraheader
2025-12-04T14:00:00.1644003Z Entering 'third_party/googletest'
2025-12-04T14:00:00.1659561Z http.https://github.com/.extraheader
2025-12-04T14:00:00.1675538Z Entering 'third_party/ideep'
2025-12-04T14:00:00.1690420Z http.https://github.com/.extraheader
2025-12-04T14:00:00.1705753Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T14:00:00.1718449Z http.https://github.com/.extraheader
2025-12-04T14:00:00.1744378Z Entering 'third_party/ittapi'
2025-12-04T14:00:00.1757712Z http.https://github.com/.extraheader
2025-12-04T14:00:00.1780069Z Entering 'third_party/kineto'
2025-12-04T14:00:00.1793097Z http.https://github.com/.extraheader
2025-12-04T14:00:00.1815450Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T14:00:00.1830747Z http.https://github.com/.extraheader
2025-12-04T14:00:00.1850108Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T14:00:00.1863751Z http.https://github.com/.extraheader
2025-12-04T14:00:00.1883304Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T14:00:00.1896802Z http.https://github.com/.extraheader
2025-12-04T14:00:00.1914785Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T14:00:00.1940535Z http.https://github.com/.extraheader
2025-12-04T14:00:00.1959987Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T14:00:00.1983836Z http.https://github.com/.extraheader
2025-12-04T14:00:00.2001682Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T14:00:00.2015589Z http.https://github.com/.extraheader
2025-12-04T14:00:00.2037388Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T14:00:00.2050914Z http.https://github.com/.extraheader
2025-12-04T14:00:00.2068217Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T14:00:00.2083378Z http.https://github.com/.extraheader
2025-12-04T14:00:00.2103931Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T14:00:00.2117501Z http.https://github.com/.extraheader
2025-12-04T14:00:00.2135132Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T14:00:00.2151433Z http.https://github.com/.extraheader
2025-12-04T14:00:00.2168928Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T14:00:00.2182544Z http.https://github.com/.extraheader
2025-12-04T14:00:00.2198749Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T14:00:00.2216542Z http.https://github.com/.extraheader
2025-12-04T14:00:00.2235884Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T14:00:00.2248893Z http.https://github.com/.extraheader
2025-12-04T14:00:00.2271662Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T14:00:00.2285105Z http.https://github.com/.extraheader
2025-12-04T14:00:00.2302169Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T14:00:00.2322157Z http.https://github.com/.extraheader
2025-12-04T14:00:00.2339778Z Entering 'third_party/kleidiai'
2025-12-04T14:00:00.2354402Z http.https://github.com/.extraheader
2025-12-04T14:00:00.2373092Z Entering 'third_party/mimalloc'
2025-12-04T14:00:00.2386934Z http.https://github.com/.extraheader
2025-12-04T14:00:00.2408689Z Entering 'third_party/nlohmann'
2025-12-04T14:00:00.2422455Z http.https://github.com/.extraheader
2025-12-04T14:00:00.2441538Z Entering 'third_party/onnx'
2025-12-04T14:00:00.2454291Z http.https://github.com/.extraheader
2025-12-04T14:00:00.2475127Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T14:00:00.2490791Z http.https://github.com/.extraheader
2025-12-04T14:00:00.2513363Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T14:00:00.2529984Z http.https://github.com/.extraheader
2025-12-04T14:00:00.2547265Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T14:00:00.2572730Z http.https://github.com/.extraheader
2025-12-04T14:00:00.2589833Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T14:00:00.2602681Z http.https://github.com/.extraheader
2025-12-04T14:00:00.2621125Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T14:00:00.2633747Z http.https://github.com/.extraheader
2025-12-04T14:00:00.2652633Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T14:00:00.2665774Z http.https://github.com/.extraheader
2025-12-04T14:00:00.2682356Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T14:00:00.2696102Z http.https://github.com/.extraheader
2025-12-04T14:00:00.2712216Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T14:00:00.2724195Z http.https://github.com/.extraheader
2025-12-04T14:00:00.2741361Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T14:00:00.2752694Z http.https://github.com/.extraheader
2025-12-04T14:00:00.2768609Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T14:00:00.2786126Z http.https://github.com/.extraheader
2025-12-04T14:00:00.2805046Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T14:00:00.2819446Z http.https://github.com/.extraheader
2025-12-04T14:00:00.2837827Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T14:00:00.2850964Z http.https://github.com/.extraheader
2025-12-04T14:00:00.2879292Z Entering 'third_party/pocketfft'
2025-12-04T14:00:00.2894514Z http.https://github.com/.extraheader
2025-12-04T14:00:00.2913787Z Entering 'third_party/protobuf'
2025-12-04T14:00:00.2927376Z http.https://github.com/.extraheader
2025-12-04T14:00:00.2948220Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T14:00:00.2962806Z http.https://github.com/.extraheader
2025-12-04T14:00:00.2981097Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T14:00:00.2995607Z http.https://github.com/.extraheader
2025-12-04T14:00:00.3014311Z Entering 'third_party/psimd'
2025-12-04T14:00:00.3028363Z http.https://github.com/.extraheader
2025-12-04T14:00:00.3052330Z Entering 'third_party/pthreadpool'
2025-12-04T14:00:00.3064907Z http.https://github.com/.extraheader
2025-12-04T14:00:00.3081908Z Entering 'third_party/pybind11'
2025-12-04T14:00:00.3097193Z http.https://github.com/.extraheader
2025-12-04T14:00:00.3113899Z Entering 'third_party/python-peachpy'
2025-12-04T14:00:00.3127686Z http.https://github.com/.extraheader
2025-12-04T14:00:00.3144236Z Entering 'third_party/sleef'
2025-12-04T14:00:00.3157742Z http.https://github.com/.extraheader
2025-12-04T14:00:00.3182803Z Entering 'third_party/tensorpipe'
2025-12-04T14:00:00.3196886Z http.https://github.com/.extraheader
2025-12-04T14:00:00.3215539Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T14:00:00.3228969Z http.https://github.com/.extraheader
2025-12-04T14:00:00.3246033Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T14:00:00.3260444Z http.https://github.com/.extraheader
2025-12-04T14:00:00.3279446Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T14:00:00.3291601Z http.https://github.com/.extraheader
2025-12-04T14:00:00.3312498Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T14:00:00.3323777Z http.https://github.com/.extraheader
2025-12-04T14:00:00.3343167Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T14:00:00.3356912Z http.https://github.com/.extraheader
2025-12-04T14:00:00.3396001Z [command]/usr/bin/git config --local --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.3419998Z [command]/usr/bin/git submodule foreach --recursive git config --local --show-origin --name-only --get-regexp remote.origin.url
2025-12-04T14:00:00.3587618Z Entering 'android/libs/fbjni'
2025-12-04T14:00:00.3601459Z file:/home/runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config	remote.origin.url
2025-12-04T14:00:00.3610753Z Entering 'third_party/FP16'
2025-12-04T14:00:00.3621716Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config	remote.origin.url
2025-12-04T14:00:00.3633161Z Entering 'third_party/FXdiv'
2025-12-04T14:00:00.3643440Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config	remote.origin.url
2025-12-04T14:00:00.3655378Z Entering 'third_party/NNPACK'
2025-12-04T14:00:00.3666588Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config	remote.origin.url
2025-12-04T14:00:00.3676853Z Entering 'third_party/NVTX'
2025-12-04T14:00:00.3688555Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config	remote.origin.url
2025-12-04T14:00:00.3698492Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T14:00:00.3710486Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config	remote.origin.url
2025-12-04T14:00:00.3722083Z Entering 'third_party/XNNPACK'
2025-12-04T14:00:00.3734145Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config	remote.origin.url
2025-12-04T14:00:00.3749628Z Entering 'third_party/aiter'
2025-12-04T14:00:00.3760018Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config	remote.origin.url
2025-12-04T14:00:00.3770491Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T14:00:00.3780546Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config	remote.origin.url
2025-12-04T14:00:00.3800595Z Entering 'third_party/benchmark'
2025-12-04T14:00:00.3810704Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config	remote.origin.url
2025-12-04T14:00:00.3823795Z Entering 'third_party/composable_kernel'
2025-12-04T14:00:00.3833691Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config	remote.origin.url
2025-12-04T14:00:00.3846114Z Entering 'third_party/cpp-httplib'
2025-12-04T14:00:00.3865266Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config	remote.origin.url
2025-12-04T14:00:00.3872735Z Entering 'third_party/cpuinfo'
2025-12-04T14:00:00.3882580Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config	remote.origin.url
2025-12-04T14:00:00.3893414Z Entering 'third_party/cudnn_frontend'
2025-12-04T14:00:00.3903792Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config	remote.origin.url
2025-12-04T14:00:00.3912860Z Entering 'third_party/cutlass'
2025-12-04T14:00:00.3923558Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config	remote.origin.url
2025-12-04T14:00:00.3936781Z Entering 'third_party/fbgemm'
2025-12-04T14:00:00.3948625Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config	remote.origin.url
2025-12-04T14:00:00.3961095Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T14:00:00.3976211Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config	remote.origin.url
2025-12-04T14:00:00.3991282Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T14:00:00.4007801Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config	remote.origin.url
2025-12-04T14:00:00.4021847Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T14:00:00.4034413Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config	remote.origin.url
2025-12-04T14:00:00.4045318Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T14:00:00.4056867Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config	remote.origin.url
2025-12-04T14:00:00.4069149Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T14:00:00.4079371Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config	remote.origin.url
2025-12-04T14:00:00.4088326Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T14:00:00.4101780Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config	remote.origin.url
2025-12-04T14:00:00.4110725Z Entering 'third_party/fbgemm/external/json'
2025-12-04T14:00:00.4123990Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config	remote.origin.url
2025-12-04T14:00:00.4135317Z Entering 'third_party/flash-attention'
2025-12-04T14:00:00.4144790Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config	remote.origin.url
2025-12-04T14:00:00.4155091Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T14:00:00.4163399Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config	remote.origin.url
2025-12-04T14:00:00.4178651Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T14:00:00.4191599Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config	remote.origin.url
2025-12-04T14:00:00.4206073Z Entering 'third_party/flatbuffers'
2025-12-04T14:00:00.4216129Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config	remote.origin.url
2025-12-04T14:00:00.4228058Z Entering 'third_party/fmt'
2025-12-04T14:00:00.4238812Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config	remote.origin.url
2025-12-04T14:00:00.4254338Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T14:00:00.4264555Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config	remote.origin.url
2025-12-04T14:00:00.4273961Z Entering 'third_party/gloo'
2025-12-04T14:00:00.4283707Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config	remote.origin.url
2025-12-04T14:00:00.4294429Z Entering 'third_party/googletest'
2025-12-04T14:00:00.4306975Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config	remote.origin.url
2025-12-04T14:00:00.4315885Z Entering 'third_party/ideep'
2025-12-04T14:00:00.4325884Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config	remote.origin.url
2025-12-04T14:00:00.4333917Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T14:00:00.4349306Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config	remote.origin.url
2025-12-04T14:00:00.4369724Z Entering 'third_party/ittapi'
2025-12-04T14:00:00.4379813Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config	remote.origin.url
2025-12-04T14:00:00.4390030Z Entering 'third_party/kineto'
2025-12-04T14:00:00.4402853Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config	remote.origin.url
2025-12-04T14:00:00.4412184Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T14:00:00.4423280Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config	remote.origin.url
2025-12-04T14:00:00.4431582Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T14:00:00.4442332Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config	remote.origin.url
2025-12-04T14:00:00.4452039Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T14:00:00.4461710Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config	remote.origin.url
2025-12-04T14:00:00.4470212Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T14:00:00.4484285Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config	remote.origin.url
2025-12-04T14:00:00.4492946Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T14:00:00.4502263Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config	remote.origin.url
2025-12-04T14:00:00.4510687Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T14:00:00.4523792Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config	remote.origin.url
2025-12-04T14:00:00.4535294Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T14:00:00.4548151Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config	remote.origin.url
2025-12-04T14:00:00.4560593Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T14:00:00.4571644Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config	remote.origin.url
2025-12-04T14:00:00.4591393Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T14:00:00.4607576Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config	remote.origin.url
2025-12-04T14:00:00.4621623Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T14:00:00.4636317Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config	remote.origin.url
2025-12-04T14:00:00.4645659Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T14:00:00.4659757Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config	remote.origin.url
2025-12-04T14:00:00.4670077Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T14:00:00.4681635Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config	remote.origin.url
2025-12-04T14:00:00.4696342Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T14:00:00.4708037Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config	remote.origin.url
2025-12-04T14:00:00.4723454Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T14:00:00.4733664Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config	remote.origin.url
2025-12-04T14:00:00.4742643Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T14:00:00.4755874Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config	remote.origin.url
2025-12-04T14:00:00.4769246Z Entering 'third_party/kleidiai'
2025-12-04T14:00:00.4781027Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config	remote.origin.url
2025-12-04T14:00:00.4792526Z Entering 'third_party/mimalloc'
2025-12-04T14:00:00.4803933Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config	remote.origin.url
2025-12-04T14:00:00.4812773Z Entering 'third_party/nlohmann'
2025-12-04T14:00:00.4824412Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config	remote.origin.url
2025-12-04T14:00:00.4838695Z Entering 'third_party/onnx'
2025-12-04T14:00:00.4849981Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config	remote.origin.url
2025-12-04T14:00:00.4866487Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T14:00:00.4876852Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config	remote.origin.url
2025-12-04T14:00:00.4888190Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T14:00:00.4898669Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config	remote.origin.url
2025-12-04T14:00:00.4907762Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T14:00:00.4919554Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config	remote.origin.url
2025-12-04T14:00:00.4933780Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T14:00:00.4948308Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config	remote.origin.url
2025-12-04T14:00:00.4956832Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T14:00:00.4967627Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config	remote.origin.url
2025-12-04T14:00:00.4976811Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T14:00:00.4985546Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config	remote.origin.url
2025-12-04T14:00:00.4995949Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T14:00:00.5008051Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config	remote.origin.url
2025-12-04T14:00:00.5017059Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T14:00:00.5031691Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config	remote.origin.url
2025-12-04T14:00:00.5040843Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T14:00:00.5051043Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config	remote.origin.url
2025-12-04T14:00:00.5060724Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T14:00:00.5073103Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config	remote.origin.url
2025-12-04T14:00:00.5080960Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T14:00:00.5091056Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config	remote.origin.url
2025-12-04T14:00:00.5104311Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T14:00:00.5114196Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config	remote.origin.url
2025-12-04T14:00:00.5133092Z Entering 'third_party/pocketfft'
2025-12-04T14:00:00.5143458Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config	remote.origin.url
2025-12-04T14:00:00.5152564Z Entering 'third_party/protobuf'
2025-12-04T14:00:00.5163065Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config	remote.origin.url
2025-12-04T14:00:00.5173966Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T14:00:00.5183677Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config	remote.origin.url
2025-12-04T14:00:00.5193772Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T14:00:00.5203715Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config	remote.origin.url
2025-12-04T14:00:00.5218601Z Entering 'third_party/psimd'
2025-12-04T14:00:00.5232463Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config	remote.origin.url
2025-12-04T14:00:00.5242812Z Entering 'third_party/pthreadpool'
2025-12-04T14:00:00.5252772Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config	remote.origin.url
2025-12-04T14:00:00.5266317Z Entering 'third_party/pybind11'
2025-12-04T14:00:00.5275521Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config	remote.origin.url
2025-12-04T14:00:00.5285424Z Entering 'third_party/python-peachpy'
2025-12-04T14:00:00.5294835Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config	remote.origin.url
2025-12-04T14:00:00.5303514Z Entering 'third_party/sleef'
2025-12-04T14:00:00.5313718Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config	remote.origin.url
2025-12-04T14:00:00.5322712Z Entering 'third_party/tensorpipe'
2025-12-04T14:00:00.5332975Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config	remote.origin.url
2025-12-04T14:00:00.5342624Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T14:00:00.5352153Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config	remote.origin.url
2025-12-04T14:00:00.5362200Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T14:00:00.5376823Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config	remote.origin.url
2025-12-04T14:00:00.5386471Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T14:00:00.5397194Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config	remote.origin.url
2025-12-04T14:00:00.5409960Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T14:00:00.5429831Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config	remote.origin.url
2025-12-04T14:00:00.5439887Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T14:00:00.5449883Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config	remote.origin.url
2025-12-04T14:00:00.5479298Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.5499261Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.5515760Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.5533393Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.5552941Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.5568089Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.5583439Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.5598202Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.5612771Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.5625381Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.5640400Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.5655073Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.5671648Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.5685991Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.5701398Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.5715547Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.5735931Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.5751058Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.5766218Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.5780103Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.5795675Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.5810682Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.5823449Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.5837038Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.5851046Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.5873535Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.5888374Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.5901416Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.5916313Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.5934975Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.5954201Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.5971337Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.5984947Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.5999250Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.6012754Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.6026820Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.6042323Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.6058552Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.6077377Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.6092598Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.6108546Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.6123809Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.6140576Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.6154870Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.6171063Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.6186764Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.6201391Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.6215846Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.6230528Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.6246221Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.6261530Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.6275636Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.6290477Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.6308607Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.6323574Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.6339100Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.6354255Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.6370520Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.6385408Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.6401117Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.6416175Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.6430195Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.6449877Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.6467481Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.6481931Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.6494560Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.6509200Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.6524114Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.6537485Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.6551737Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.6566569Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.6581234Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.6596309Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.6611721Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.6625931Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.6640687Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.6655082Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.6668346Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.6682966Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.6698253Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.6718303Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:00.6808218Z Post job cleanup.
2025-12-04T14:00:00.7265560Z [command]/usr/bin/git version
2025-12-04T14:00:00.7294954Z git version 2.52.0
2025-12-04T14:00:00.7321881Z Copying '/home/runner/.gitconfig' to '/home/runner/_work/_temp/9e0999af-01d0-4c57-a4e0-716bc118887e/.gitconfig'
2025-12-04T14:00:00.7328020Z Temporarily overriding HOME='/home/runner/_work/_temp/9e0999af-01d0-4c57-a4e0-716bc118887e' before making global git config changes
2025-12-04T14:00:00.7328359Z Adding repository directory to the temporary git global config as a safe directory
2025-12-04T14:00:00.7330832Z [command]/usr/bin/git config --global --add safe.directory /home/runner/_work/pytorch/pytorch
2025-12-04T14:00:00.7360757Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand
2025-12-04T14:00:00.7381163Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :"
2025-12-04T14:00:00.7580681Z Entering 'android/libs/fbjni'
2025-12-04T14:00:00.7612307Z Entering 'third_party/FP16'
2025-12-04T14:00:00.7641095Z Entering 'third_party/FXdiv'
2025-12-04T14:00:00.7663502Z Entering 'third_party/NNPACK'
2025-12-04T14:00:00.7694871Z Entering 'third_party/NVTX'
2025-12-04T14:00:00.7716991Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T14:00:00.7737471Z Entering 'third_party/XNNPACK'
2025-12-04T14:00:00.7762799Z Entering 'third_party/aiter'
2025-12-04T14:00:00.7787588Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T14:00:00.7825030Z Entering 'third_party/benchmark'
2025-12-04T14:00:00.7855277Z Entering 'third_party/composable_kernel'
2025-12-04T14:00:00.7885928Z Entering 'third_party/cpp-httplib'
2025-12-04T14:00:00.7908317Z Entering 'third_party/cpuinfo'
2025-12-04T14:00:00.7936584Z Entering 'third_party/cudnn_frontend'
2025-12-04T14:00:00.7963975Z Entering 'third_party/cutlass'
2025-12-04T14:00:00.7991750Z Entering 'third_party/fbgemm'
2025-12-04T14:00:00.8016328Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T14:00:00.8040607Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T14:00:00.8073420Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T14:00:00.8094515Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T14:00:00.8130070Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T14:00:00.8159919Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T14:00:00.8183322Z Entering 'third_party/fbgemm/external/json'
2025-12-04T14:00:00.8221098Z Entering 'third_party/flash-attention'
2025-12-04T14:00:00.8248947Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T14:00:00.8281874Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T14:00:00.8312498Z Entering 'third_party/flatbuffers'
2025-12-04T14:00:00.8348947Z Entering 'third_party/fmt'
2025-12-04T14:00:00.8375837Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T14:00:00.8398384Z Entering 'third_party/gloo'
2025-12-04T14:00:00.8425099Z Entering 'third_party/googletest'
2025-12-04T14:00:00.8455931Z Entering 'third_party/ideep'
2025-12-04T14:00:00.8480057Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T14:00:00.8510895Z Entering 'third_party/ittapi'
2025-12-04T14:00:00.8540447Z Entering 'third_party/kineto'
2025-12-04T14:00:00.8563057Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T14:00:00.8592484Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T14:00:00.8624576Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T14:00:00.8651200Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T14:00:00.8677753Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T14:00:00.8703010Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T14:00:00.8725389Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T14:00:00.8760163Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T14:00:00.8784474Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T14:00:00.8804616Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T14:00:00.8827197Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T14:00:00.8852375Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T14:00:00.8880881Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T14:00:00.8914770Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T14:00:00.8949291Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T14:00:00.8980506Z Entering 'third_party/kleidiai'
2025-12-04T14:00:00.9009511Z Entering 'third_party/mimalloc'
2025-12-04T14:00:00.9036091Z Entering 'third_party/nlohmann'
2025-12-04T14:00:00.9064010Z Entering 'third_party/onnx'
2025-12-04T14:00:00.9099366Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T14:00:00.9128366Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T14:00:00.9165159Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T14:00:00.9190715Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T14:00:00.9217720Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T14:00:00.9246329Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T14:00:00.9271923Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T14:00:00.9299232Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T14:00:00.9327626Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T14:00:00.9358186Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T14:00:00.9387005Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T14:00:00.9422514Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T14:00:00.9460368Z Entering 'third_party/pocketfft'
2025-12-04T14:00:00.9488397Z Entering 'third_party/protobuf'
2025-12-04T14:00:00.9515385Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T14:00:00.9540803Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T14:00:00.9568977Z Entering 'third_party/psimd'
2025-12-04T14:00:00.9596211Z Entering 'third_party/pthreadpool'
2025-12-04T14:00:00.9623484Z Entering 'third_party/pybind11'
2025-12-04T14:00:00.9653626Z Entering 'third_party/python-peachpy'
2025-12-04T14:00:00.9676706Z Entering 'third_party/sleef'
2025-12-04T14:00:00.9702096Z Entering 'third_party/tensorpipe'
2025-12-04T14:00:00.9728238Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T14:00:00.9750288Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T14:00:00.9774082Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T14:00:00.9797371Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T14:00:00.9821479Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T14:00:00.9867057Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader
2025-12-04T14:00:00.9896274Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :"
2025-12-04T14:00:01.0085973Z Entering 'android/libs/fbjni'
2025-12-04T14:00:01.0117839Z Entering 'third_party/FP16'
2025-12-04T14:00:01.0141427Z Entering 'third_party/FXdiv'
2025-12-04T14:00:01.0165385Z Entering 'third_party/NNPACK'
2025-12-04T14:00:01.0185955Z Entering 'third_party/NVTX'
2025-12-04T14:00:01.0210042Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T14:00:01.0231567Z Entering 'third_party/XNNPACK'
2025-12-04T14:00:01.0260562Z Entering 'third_party/aiter'
2025-12-04T14:00:01.0289253Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T14:00:01.0318547Z Entering 'third_party/benchmark'
2025-12-04T14:00:01.0348652Z Entering 'third_party/composable_kernel'
2025-12-04T14:00:01.0377309Z Entering 'third_party/cpp-httplib'
2025-12-04T14:00:01.0405551Z Entering 'third_party/cpuinfo'
2025-12-04T14:00:01.0429725Z Entering 'third_party/cudnn_frontend'
2025-12-04T14:00:01.0462235Z Entering 'third_party/cutlass'
2025-12-04T14:00:01.0495703Z Entering 'third_party/fbgemm'
2025-12-04T14:00:01.0526622Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T14:00:01.0549423Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T14:00:01.0578510Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T14:00:01.0601361Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T14:00:01.0633616Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T14:00:01.0657464Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T14:00:01.0678834Z Entering 'third_party/fbgemm/external/json'
2025-12-04T14:00:01.0704524Z Entering 'third_party/flash-attention'
2025-12-04T14:00:01.0728555Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T14:00:01.0752568Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T14:00:01.0786539Z Entering 'third_party/flatbuffers'
2025-12-04T14:00:01.0810895Z Entering 'third_party/fmt'
2025-12-04T14:00:01.0831628Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T14:00:01.0853554Z Entering 'third_party/gloo'
2025-12-04T14:00:01.0876784Z Entering 'third_party/googletest'
2025-12-04T14:00:01.0903466Z Entering 'third_party/ideep'
2025-12-04T14:00:01.0928685Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T14:00:01.0957182Z Entering 'third_party/ittapi'
2025-12-04T14:00:01.0983789Z Entering 'third_party/kineto'
2025-12-04T14:00:01.1017803Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T14:00:01.1040285Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T14:00:01.1064164Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T14:00:01.1086450Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T14:00:01.1116208Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T14:00:01.1144202Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T14:00:01.1174655Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T14:00:01.1195406Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T14:00:01.1222665Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T14:00:01.1245630Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T14:00:01.1271791Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T14:00:01.1295948Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T14:00:01.1323085Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T14:00:01.1356164Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T14:00:01.1379836Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T14:00:01.1405863Z Entering 'third_party/kleidiai'
2025-12-04T14:00:01.1432733Z Entering 'third_party/mimalloc'
2025-12-04T14:00:01.1458486Z Entering 'third_party/nlohmann'
2025-12-04T14:00:01.1484294Z Entering 'third_party/onnx'
2025-12-04T14:00:01.1513811Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T14:00:01.1539430Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T14:00:01.1562738Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T14:00:01.1589666Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T14:00:01.1613033Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T14:00:01.1635657Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T14:00:01.1655964Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T14:00:01.1675438Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T14:00:01.1698970Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T14:00:01.1724167Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T14:00:01.1749204Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T14:00:01.1779191Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T14:00:01.1809153Z Entering 'third_party/pocketfft'
2025-12-04T14:00:01.1836045Z Entering 'third_party/protobuf'
2025-12-04T14:00:01.1860975Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T14:00:01.1884603Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T14:00:01.1916142Z Entering 'third_party/psimd'
2025-12-04T14:00:01.1941064Z Entering 'third_party/pthreadpool'
2025-12-04T14:00:01.1963268Z Entering 'third_party/pybind11'
2025-12-04T14:00:01.1993411Z Entering 'third_party/python-peachpy'
2025-12-04T14:00:01.2014011Z Entering 'third_party/sleef'
2025-12-04T14:00:01.2035389Z Entering 'third_party/tensorpipe'
2025-12-04T14:00:01.2059891Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T14:00:01.2095049Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T14:00:01.2124191Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T14:00:01.2146211Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T14:00:01.2171862Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T14:00:01.2218029Z [command]/usr/bin/git config --local --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.2241262Z [command]/usr/bin/git submodule foreach --recursive git config --local --show-origin --name-only --get-regexp remote.origin.url
2025-12-04T14:00:01.2411983Z Entering 'android/libs/fbjni'
2025-12-04T14:00:01.2423391Z file:/home/runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config	remote.origin.url
2025-12-04T14:00:01.2432711Z Entering 'third_party/FP16'
2025-12-04T14:00:01.2443459Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config	remote.origin.url
2025-12-04T14:00:01.2455928Z Entering 'third_party/FXdiv'
2025-12-04T14:00:01.2467693Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config	remote.origin.url
2025-12-04T14:00:01.2475881Z Entering 'third_party/NNPACK'
2025-12-04T14:00:01.2484957Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config	remote.origin.url
2025-12-04T14:00:01.2495489Z Entering 'third_party/NVTX'
2025-12-04T14:00:01.2505313Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config	remote.origin.url
2025-12-04T14:00:01.2514269Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T14:00:01.2523487Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config	remote.origin.url
2025-12-04T14:00:01.2531974Z Entering 'third_party/XNNPACK'
2025-12-04T14:00:01.2543262Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config	remote.origin.url
2025-12-04T14:00:01.2563622Z Entering 'third_party/aiter'
2025-12-04T14:00:01.2574471Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config	remote.origin.url
2025-12-04T14:00:01.2586128Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T14:00:01.2597440Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config	remote.origin.url
2025-12-04T14:00:01.2610733Z Entering 'third_party/benchmark'
2025-12-04T14:00:01.2620756Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config	remote.origin.url
2025-12-04T14:00:01.2630191Z Entering 'third_party/composable_kernel'
2025-12-04T14:00:01.2646397Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config	remote.origin.url
2025-12-04T14:00:01.2660285Z Entering 'third_party/cpp-httplib'
2025-12-04T14:00:01.2674239Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config	remote.origin.url
2025-12-04T14:00:01.2683510Z Entering 'third_party/cpuinfo'
2025-12-04T14:00:01.2693837Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config	remote.origin.url
2025-12-04T14:00:01.2702720Z Entering 'third_party/cudnn_frontend'
2025-12-04T14:00:01.2713937Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config	remote.origin.url
2025-12-04T14:00:01.2724457Z Entering 'third_party/cutlass'
2025-12-04T14:00:01.2734283Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config	remote.origin.url
2025-12-04T14:00:01.2747219Z Entering 'third_party/fbgemm'
2025-12-04T14:00:01.2756289Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config	remote.origin.url
2025-12-04T14:00:01.2765918Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T14:00:01.2777979Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config	remote.origin.url
2025-12-04T14:00:01.2786944Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T14:00:01.2797277Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config	remote.origin.url
2025-12-04T14:00:01.2810420Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T14:00:01.2820057Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config	remote.origin.url
2025-12-04T14:00:01.2828725Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T14:00:01.2838288Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config	remote.origin.url
2025-12-04T14:00:01.2850693Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T14:00:01.2860624Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config	remote.origin.url
2025-12-04T14:00:01.2869189Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T14:00:01.2878176Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config	remote.origin.url
2025-12-04T14:00:01.2888394Z Entering 'third_party/fbgemm/external/json'
2025-12-04T14:00:01.2896686Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config	remote.origin.url
2025-12-04T14:00:01.2908014Z Entering 'third_party/flash-attention'
2025-12-04T14:00:01.2918295Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config	remote.origin.url
2025-12-04T14:00:01.2927301Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T14:00:01.2936447Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config	remote.origin.url
2025-12-04T14:00:01.2947219Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T14:00:01.2956565Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config	remote.origin.url
2025-12-04T14:00:01.2975962Z Entering 'third_party/flatbuffers'
2025-12-04T14:00:01.2990053Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config	remote.origin.url
2025-12-04T14:00:01.3000444Z Entering 'third_party/fmt'
2025-12-04T14:00:01.3011285Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config	remote.origin.url
2025-12-04T14:00:01.3021975Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T14:00:01.3033124Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config	remote.origin.url
2025-12-04T14:00:01.3047177Z Entering 'third_party/gloo'
2025-12-04T14:00:01.3057004Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config	remote.origin.url
2025-12-04T14:00:01.3065464Z Entering 'third_party/googletest'
2025-12-04T14:00:01.3074549Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config	remote.origin.url
2025-12-04T14:00:01.3083092Z Entering 'third_party/ideep'
2025-12-04T14:00:01.3094587Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config	remote.origin.url
2025-12-04T14:00:01.3102574Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T14:00:01.3121814Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config	remote.origin.url
2025-12-04T14:00:01.3143208Z Entering 'third_party/ittapi'
2025-12-04T14:00:01.3154114Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config	remote.origin.url
2025-12-04T14:00:01.3163127Z Entering 'third_party/kineto'
2025-12-04T14:00:01.3174486Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config	remote.origin.url
2025-12-04T14:00:01.3184875Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T14:00:01.3202399Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config	remote.origin.url
2025-12-04T14:00:01.3211775Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T14:00:01.3223758Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config	remote.origin.url
2025-12-04T14:00:01.3236890Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T14:00:01.3247690Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config	remote.origin.url
2025-12-04T14:00:01.3255836Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T14:00:01.3264648Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config	remote.origin.url
2025-12-04T14:00:01.3272833Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T14:00:01.3283751Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config	remote.origin.url
2025-12-04T14:00:01.3292818Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T14:00:01.3303385Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config	remote.origin.url
2025-12-04T14:00:01.3314099Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T14:00:01.3325519Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config	remote.origin.url
2025-12-04T14:00:01.3333847Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T14:00:01.3343831Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config	remote.origin.url
2025-12-04T14:00:01.3356148Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T14:00:01.3369303Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config	remote.origin.url
2025-12-04T14:00:01.3381352Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T14:00:01.3390234Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config	remote.origin.url
2025-12-04T14:00:01.3398994Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T14:00:01.3409611Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config	remote.origin.url
2025-12-04T14:00:01.3417650Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T14:00:01.3427428Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config	remote.origin.url
2025-12-04T14:00:01.3442287Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T14:00:01.3452227Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config	remote.origin.url
2025-12-04T14:00:01.3464990Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T14:00:01.3474738Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config	remote.origin.url
2025-12-04T14:00:01.3482502Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T14:00:01.3492838Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config	remote.origin.url
2025-12-04T14:00:01.3503282Z Entering 'third_party/kleidiai'
2025-12-04T14:00:01.3514007Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config	remote.origin.url
2025-12-04T14:00:01.3523459Z Entering 'third_party/mimalloc'
2025-12-04T14:00:01.3533336Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config	remote.origin.url
2025-12-04T14:00:01.3542244Z Entering 'third_party/nlohmann'
2025-12-04T14:00:01.3552139Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config	remote.origin.url
2025-12-04T14:00:01.3561955Z Entering 'third_party/onnx'
2025-12-04T14:00:01.3571779Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config	remote.origin.url
2025-12-04T14:00:01.3587375Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T14:00:01.3597372Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config	remote.origin.url
2025-12-04T14:00:01.3609642Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T14:00:01.3620500Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config	remote.origin.url
2025-12-04T14:00:01.3632175Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T14:00:01.3644368Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config	remote.origin.url
2025-12-04T14:00:01.3653579Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T14:00:01.3664752Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config	remote.origin.url
2025-12-04T14:00:01.3673292Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T14:00:01.3684955Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config	remote.origin.url
2025-12-04T14:00:01.3694302Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T14:00:01.3703881Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config	remote.origin.url
2025-12-04T14:00:01.3712322Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T14:00:01.3722378Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config	remote.origin.url
2025-12-04T14:00:01.3731197Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T14:00:01.3740754Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config	remote.origin.url
2025-12-04T14:00:01.3749465Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T14:00:01.3761194Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config	remote.origin.url
2025-12-04T14:00:01.3769882Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T14:00:01.3785732Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config	remote.origin.url
2025-12-04T14:00:01.3795651Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T14:00:01.3811076Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config	remote.origin.url
2025-12-04T14:00:01.3822271Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T14:00:01.3833679Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config	remote.origin.url
2025-12-04T14:00:01.3852323Z Entering 'third_party/pocketfft'
2025-12-04T14:00:01.3862582Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config	remote.origin.url
2025-12-04T14:00:01.3871258Z Entering 'third_party/protobuf'
2025-12-04T14:00:01.3882028Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config	remote.origin.url
2025-12-04T14:00:01.3896527Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T14:00:01.3912371Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config	remote.origin.url
2025-12-04T14:00:01.3926442Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T14:00:01.3937375Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config	remote.origin.url
2025-12-04T14:00:01.3951868Z Entering 'third_party/psimd'
2025-12-04T14:00:01.3963196Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config	remote.origin.url
2025-12-04T14:00:01.3974526Z Entering 'third_party/pthreadpool'
2025-12-04T14:00:01.3985530Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config	remote.origin.url
2025-12-04T14:00:01.3999878Z Entering 'third_party/pybind11'
2025-12-04T14:00:01.4015859Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config	remote.origin.url
2025-12-04T14:00:01.4025089Z Entering 'third_party/python-peachpy'
2025-12-04T14:00:01.4036941Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config	remote.origin.url
2025-12-04T14:00:01.4046981Z Entering 'third_party/sleef'
2025-12-04T14:00:01.4058428Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config	remote.origin.url
2025-12-04T14:00:01.4066716Z Entering 'third_party/tensorpipe'
2025-12-04T14:00:01.4076465Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config	remote.origin.url
2025-12-04T14:00:01.4084645Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T14:00:01.4102676Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config	remote.origin.url
2025-12-04T14:00:01.4112249Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T14:00:01.4122634Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config	remote.origin.url
2025-12-04T14:00:01.4136942Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T14:00:01.4146789Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config	remote.origin.url
2025-12-04T14:00:01.4159700Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T14:00:01.4177009Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config	remote.origin.url
2025-12-04T14:00:01.4184424Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T14:00:01.4199346Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config	remote.origin.url
2025-12-04T14:00:01.4229876Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.4252363Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.4267646Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.4285693Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.4304875Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.4324356Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.4338738Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.4357772Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.4374284Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.4388694Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.4403065Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.4416487Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.4436271Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.4450225Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.4464597Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.4478735Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.4492757Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.4505605Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.4519772Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.4536433Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.4554033Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.4572966Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.4593140Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.4607738Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.4627991Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.4643053Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.4662746Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.4676756Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.4692279Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.4705097Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.4718497Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.4731192Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.4745052Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.4758100Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.4778737Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.4792371Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.4806855Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.4821933Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.4837111Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.4852556Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.4868950Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.4891591Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.4906473Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.4923575Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.4938780Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.4952440Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.4965740Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.4978756Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.4992217Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.5012445Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.5026442Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.5039532Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.5054737Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.5071239Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.5085587Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.5105128Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.5119857Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.5135083Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.5153858Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.5179253Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.5195657Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.5210391Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.5228298Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.5243092Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.5256989Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.5280561Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.5301396Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.5315984Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.5331151Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.5350423Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.5368063Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.5385429Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.5406535Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.5429385Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.5445910Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.5461890Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.5476675Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.5497839Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.5512607Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.5527484Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.5542565Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:00:01.5650148Z Cleaning up orphan processes